Share
What you'll be doing:
Work closely with internal engineering and product teams and external app developers on solving local end-to-end AI GPU deployment challenges on the NVIDIA RTX AI platform.
Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance.
Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance on NVIDIA ARM-based SoCs.
Contribute code to internal and external projects, including open source.
Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs.
What we need to see:
5+ years of professional experience in local GPU deployment, profiling and optimization.
BS or MS degree in Computer Science, Engineering, or related degree.
Strong proficiency in C/C++, software design, programming techniques.
Familiarity with and development experience on the Windows operating system.
Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite.
Strong verbal and written communication skills in English and organization skills, with a logical approach to problem solving, time management, and task prioritization skills.
Excellent interpersonal skills.
Ways to stand out from the crowd:
Experience with GPU-accelerated AI inference driven by NVIDIA APIs, specifically cuDNN, CUTLASS, TensorRT.
Detailed knowledge of the latest generation GPU architectures.
Confirmed expert knowledge in Vulkan and / or DX12.
Experience with AI deployment on NPUs and ARM architectures.
Contributions to open source projects.
These jobs might be a good fit