The point where experts and best companies meet
Share
What you will be doing:
Understand, analyze, profile, and optimize deep learning inference workloads on state-of-the-art hardware and software platforms.
Collaborate with researchers and engineers across NVIDIA, providing guidance on improving the performance of workloads.
Implement production-quality software across NVIDIA's deep learning platform stack, such as TensorRT, TensorRT-LLM and vLLM.
Build tools to automate workload analysis, workload optimization, and other critical workflows.
What we want to see:
5+ years of experience.
MSc or PhD in CS, EE or CSEE or equivalent experience.
Strong background in deep learning and neural networks, in particular training and inference optimizations.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Proven experience analyzing, modeling and tuning application performance.
Programming skills in C++ and Python.
Ways to stand out from the crowd:
Strong fundamentals in algorithms.
Experience with production deployment of Deep Learning models, especially LLMs and multimodal.
Proven experience with processor and system-level performance modeling.
GPU programming experience (CUDA or OpenCL) is a strong plus but not required.
You will also be eligible for equity and .
These jobs might be a good fit