Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia AI System Engineer – New College Grad 
United States, Texas 
727433829

26.08.2025
US, CA, Santa Clara
US, WA, Redmond
time type
Full time
posted on
Posted 5 Days Ago
job requisition id

What You’ll Be Doing:

  • Optimize inference deployment by pushing the Pareto frontier of Accuracy, Throughput and Interactivity at datacenter scale

  • Develop high-fidelity performance models to prototype emerging algorithmic techniques & hardware optimizations to drive model-hardware co-design for Generative AI.

  • Prioritize features to guide future software and hardware roadmap based on detailed performance modeling and analysis

  • Model end-to-end performance impact of emerging GenAI workflows - such as Agentic Pipelines, Inference-time compute scaling, etc. – to understand future datacenter needs

  • This position requires you to keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.

What we need to see:

  • Pursuing or recently completed a MS or PhD degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.

  • Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.

  • Solid understanding of Machine Learning fundamentals, model parallelism and inference serving techniques.

  • Proficiency in Python (and optionally C++) for simulator design and data analysis.

Ways to Stand Out from the Crowd:

  • Experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.

  • Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.

  • Proven track record of working in cross-functional teams, spanning algorithms, software and hardware architecture.

  • Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.

  • Experience with GPU computing (CUDA) or deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang

You will also be eligible for equity and .