Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia Senior Deep Learning Software Engineer 
United States, California 
535374030

02.07.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 18 Days Ago
job requisition id

What you’ll be doing:

  • Play a pivotal role in defining of a modular, scalable platform to seamlessly bridge training and deployment workflows—enabling tight integration of deployment tooling with training frameworks such as Megatron and Nemo

  • Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.

  • Develop support for inference optimization techniques such as speculative decoding and LoRA.

  • Collaborate with teams across NVIDIA to use performant kernel implementations within the automated deployment solution.

  • Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.

  • Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.

What we need to see:

  • Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.

  • 8+ years of relevant work or research experience in Deep Learning.

  • Excellent software design skills, including debugging, performance analysis, and test design.

  • Strong proficiency in Python, PyTorch, and related ML tools.

  • Strong algorithms and programming fundamentals.

  • Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out from the crowd:

  • Contributions to PyTorch, JAX, or other Machine Learning Frameworks.

  • Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.

  • Familiarity with NVIDIA's deep learning SDKs such as TensorRT.

  • Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

You will also be eligible for equity and .