The point where experts and best companies meet
Share
What You Will Be Doing:
Understand, analyze, profile, and optimize large language model training on state-of-the-art hardware and software platforms.
Understand the big picture of LLM training performance on GPUs, prioritizing and then solving problems across all state-of-the-art LLM variations from research to industry.
Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
Implement and simulate key LLM workload behaviors in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
Build tools to automate workload analysis, workload optimization, and other critical workflows.
What We Need To See:
PhD (or equivalent experience) in CS, EE or CSEE and 5+ years; or MS and 8+ years of relevant work experience.
Strong background in deep learning and neural networks, in particular training and large language models.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Expertise in analyzing and tuning application performance, preferably on GPUs.
Familiarity with common deep learning software packages like PyTorch and JAX.
Prior experience with processor and system-level performance modelling.
Programming skills in C++, Python, and CUDA.
You will also be eligible for equity and .
These jobs might be a good fit