Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia Senior On-Device Model Inference Optimization Engineer 
China, Beijing, Beijing 
919717667

Yesterday
China, Beijing
time type
Full time
posted on
Posted 6 Days Ago
job requisition id

What you'll be doing:

  • Develop and implement strategies to optimize AI model inference for on-device deployment.

  • Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.

  • Optimize performance-critical components using CUDA and C++.

  • Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.

  • Benchmark inference performance, identify bottlenecks, and implement solutions.

  • Research and apply innovative methods for inference optimization.

  • Adapt models for diverse hardware platforms and operating systems with varying capabilities.

  • Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.

  • Recommend and implement model architecture changes to improve the accuracy-latency balance.

What we need to see:

  • MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.

  • Over 5 years of confirmed experience specializing in model inference and optimization.

  • 8+ overall years of work experience in a relevant area

  • Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.

  • Proven experience in optimizing inference for transformer and convolutional architectures.

  • Strong programming proficiency in CUDA, Python, and C++.

  • In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.

  • Skilled in building and deploying scalable, cloud-based inference systems.

  • Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.

  • Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.

  • Strong collaboration and communication skills for working optimally across multidisciplinary teams.

  • A proactive, diligent mentality with a drive to tackle complex optimization challenges.

Ways to stand out from the crowd:

  • Publications or industry experience in optimizing and deploying model inference at scale.

  • Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.

  • Active contributions to open-source projects focused on inference optimization or machine learning frameworks.

  • Experience in designing and deploying inference pipelines for real-time or autonomous systems.