Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Nvidia Senior On-Device Model Inference Optimization Engineer
United States, Texas
322010857

01.12.2024

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

What you'll be doing:

Develop and implement strategies to optimize AI model inference for on-device deployment.
Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.
Optimize performance-critical components using CUDA and C++.
Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.
Benchmark inference performance, identify bottlenecks, and implement solutions.
Research and apply innovative methods for inference optimization.
Adapt models for diverse hardware platforms and operating systems with varying capabilities.
Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.
Recommend and implement model architecture changes to improve the accuracy-latency balance.

What we need to see:

MSc or PhD in Computer Science, Engineering, or a related field, or equivalent professional experience.
Over 5 years of confirmed experience specializing in model inference and optimization.
10+ years of work experience in a relevant area
Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.
Proven experience in optimizing inference for transformer and convolutional architectures.
Strong programming proficiency in CUDA, Python, and C++.
In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.
Skilled in building and deploying scalable, cloud-based inference systems.
Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.
Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.
Strong collaboration and communication skills for working optimally across multidisciplinary teams.
A proactive, diligent mentality with a drive to tackle complex optimization challenges.

Ways to stand out from the crowd:

Publications or industry experience in optimizing and deploying model inference at scale.
Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.
Active contributions to open-source projects focused on inference optimization or machine learning frameworks.
Experience in designing and deploying inference pipelines for real-time or autonomous systems.

You will also be eligible for equity and .

These jobs might be a good fit

Nvidia Senior Deep Learning Software Engineer Inference Model Optim... United States, Texas

Google Senior Software Engineer On-Device Solutions United States, California, Sunnyvale

Apple AppleCare On-Device Quality Engineer United States, Texas, Austin

Qualcomm Senior Research Engineer On-Device LLM Efficiency United States, California, San Diego

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV

Nvidia Senior On-Device Model Inference Optimization Engineer United States, Texas 322010857

Nvidia Senior Deep Learning Software Engineer Inference Model Optim... United States, Texas

Google Senior Software Engineer On-Device Solutions United States, California, Sunnyvale

Apple AppleCare On-Device Quality Engineer United States, Texas, Austin

Qualcomm Senior Research Engineer On-Device LLM Efficiency United States, California, San Diego

Nvidia Senior On-Device Model Inference Optimization Engineer
United States, Texas
322010857