Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia AI ML Performance Engineer 
United States, Washington 
709344553

15.07.2025
US, WA, Redmond
time type
Full time
posted on
Posted 5 Days Ago
job requisition id

What You’ll Be Doing:

  • Develop high-fidelity performance models to prototype emerging algorithmic techniques in Generative AI to drive model-hardware co-design.

  • Design targeted optimizations for inference deployment to maximize Pareto frontier of Accuracy, Throughput and Interactivity.

  • Quantify performance benefit of targeted optimizations to prioritize features and guide future software and hardware roadmap.

  • Model end-to-end performance impact of emerging GenAI workflows - such as Agentic Pipelines, Inference-time compute scaling, etc. - to guide datacenter design and optimization.

  • This position requires you to keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.

What we need to see:

  • A minimum qualification of a Master's degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.

  • Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.

  • Solid understanding of LLM internals (attention mechanisms, FFN structures), model parallelism and inference serving techniques.

  • 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.

  • Proficiency in Python (and optionally C++) for simulator design and data analysis.

  • Growth mindset and pragmatic “measure, iterate, deliver” approach.

Ways to Stand Out from the Crowd:

  • Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.

  • Proven track record of working in cross-functional teams, spanning algorithms, software and hardware architecture.

  • Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.

  • Experience with GPU computing (CUDA)

  • Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang

You will also be eligible for equity and .