Finding the best job has never been easier
Share
Engage closely with internal engineering teams and external partners on solving local end-to-end LLM & Generative AI GPU deployment challenges.
Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance.
Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance.
Guide developers of AI applications applying methodologies for efficient adoption of DL frameworks targeting maximal utilization of GPU Tensor Cores for the best possible inference performance.
Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs.
Deep theoretical knowledge about Transformer architectures - specifically LLMs and Generative AI - and convolutional neural networks.
8+years of professional experience in local GPU deployment, profiling and optimization.
BS or MS degree in Computer Science, Engineering, or related degree.
Strong proficiency in C/C++, Python, software design, programming techniques.
Experience working with AI inference frameworks.
Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite.
Strong verbal and written communication skills in English and organization skills, with a logical approach to problem solving, time management, and task prioritization skills.
Excellent interpersonal skills.
Some travel is required for conferences and for on-site visits with external partners.
Proficiency in GPU-accelerated AI inference driven by NVIDIA APIs, specifically cuDNN, TensorRT & TensorRT-LLM.
Experience with AI deployment on NPUs and ARM architectures.
Confirmed expert knowledge in Vulkan and / or DX12.
Detailed knowledge of the latest generation GPU architectures.
These jobs might be a good fit