Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior On-Device Model Inference Optimization Engineer 
United States, Texas 
322010857

01.12.2024

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

What you'll be doing:
  • Develop and implement strategies to optimize AI model inference for on-device deployment.

  • Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.

  • Optimize performance-critical components using CUDA and C++.

  • Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.

  • Benchmark inference performance, identify bottlenecks, and implement solutions.

  • Research and apply innovative methods for inference optimization.

  • Adapt models for diverse hardware platforms and operating systems with varying capabilities.

  • Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.

  • Recommend and implement model architecture changes to improve the accuracy-latency balance.

What we need to see:
  • MSc or PhD in Computer Science, Engineering, or a related field, or equivalent professional experience.

  • Over 5 years of confirmed experience specializing in model inference and optimization.

  • 10+ years of work experience in a relevant area

  • Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.

  • Proven experience in optimizing inference for transformer and convolutional architectures.

  • Strong programming proficiency in CUDA, Python, and C++.

  • In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.

  • Skilled in building and deploying scalable, cloud-based inference systems.

  • Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.

  • Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.

  • Strong collaboration and communication skills for working optimally across multidisciplinary teams.

  • A proactive, diligent mentality with a drive to tackle complex optimization challenges.

Ways to stand out from the crowd:
  • Publications or industry experience in optimizing and deploying model inference at scale.

  • Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.

  • Active contributions to open-source projects focused on inference optimization or machine learning frameworks.

  • Experience in designing and deploying inference pipelines for real-time or autonomous systems.

You will also be eligible for equity and .