Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia Senior Deep Learning Software Engineer Inference Model Optimization 
United States, Texas 
829574397

01.12.2024

What you’ll be doing:

  • Train, develop, and deploy state-of-the generative AI models like LLMs and diffusion models using NVIDIA's AI software stack.

  • Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.

  • Develop high-performance optimization techniques for inference, such as automated model sharding techniques (e.g. tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and more.

  • Collaborate with teams across NVIDIA to use performant kernel implementations within our automated deployment solution.

  • Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.

  • Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.

  • Play a pivotal role in architecting and designing a modular and scalable software platform to provide an excellent user experience with broad model support and optimization techniques to increase adoption.

What we need to see:

  • Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.

  • 5+ years of relevant work or research experience in Deep Learning.

  • Excellent software design skills, including debugging, performance analysis, and test design.

  • Strong proficiency in Python, PyTorch, and related ML tools (e.g. HuggingFace).

  • Strong algorithms and programming fundamentals.

  • Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out from the crowd:

  • Contributions to PyTorch, JAX, or other Machine Learning Frameworks.

  • Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.

  • Familiarity with NVIDIA's deep learning SDKs such as TensorRT.

  • Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

You will also be eligible for equity and .