Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior Deep Learning Architect LLM Inference 
United States, California 
434093268

06.05.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 3 Days Ago
job requisition id

What you'll be doing:

  • You will be responsible for characterizing the latest LLMs and inference servers like vLLM and SGLang to ensure that TRT-LLM maintains its leadership position.

  • Join forces with the performance marketing team to build engaging content, including blog posts and other written materials, that highlight TRT-LLM's outstanding achievements.

  • Collaborate with engineers from AI startup companies to debug and establish standard methodologies.

  • Profile GPU kernel-level performance to identify hardware and software optimization opportunities.

  • Develop profiling and analysis software tools that can keep up with the rapid pace of network scaling.

  • Contribute to deep learning software projects, such as PyTorch, TRT-LLM, vLLM, and SGLang to drive advancements in the field.

  • Verify that TRT-LLM's performance meets expectations for new GPU product launches.

  • Collaborate across the company to guide the direction of inference serving, working with software, research, and product teams to ensure world-class performance.

What we need to see:

  • Master's or PhD degree in Computer Science, Computer Engineering, or related fields, or equivalent experience.

  • 6+ years of relevant industry experience

  • Detailed knowledge of deep learning inference serving, PyTorch programming, profiling, and compiler optimizations.

  • Proficiency in Python and C++ programming languages and familiarity with CUDA.

  • Experience with LLMs and their performance challenges and opportunities.

  • Solid understanding of CPU and GPU microarchitecture and performance characteristics.

  • Experience with complex software projects like frameworks, compilers, or operating systems.

  • Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out from the crowd:

  • Demonstrate a drive to continuously improve software and hardware performance.

  • Showcase examples of novel use cases for agentic AI tools in the workplace.

  • Experience with database and visualization tools like D3.js will set you apart.

You will also be eligible for equity and .