Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior Deep Learning Software Engineer 
United States, California 
790041824

01.12.2024

What you’ll be doing:

  • Play a pivotal role in architecting and designing a modular and scalable software platform to provide an excellent user experience with broad model support and optimization techniques.

  • Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.

  • Develop high-performance optimization techniques for inference, such as automated model sharding techniques (e.g. tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and more.

  • Collaborate with teams across Nvidia to use performant kernel implementations within the automated deployment solution.

  • Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.

  • Continuously innovate on the inference performance to ensure Nvidia's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.

What we need to see:

  • Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.

  • 5+ years of relevant work or research experience in Deep Learning.

  • Excellent software design skills, including debugging, performance analysis, and test design.

  • Strong proficiency in Python, PyTorch, and related ML tools.

  • Strong algorithms and programming fundamentals.

  • Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out from the crowd:

  • Contributions to PyTorch, JAX, or other Machine Learning Frameworks.

  • Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.

  • Familiarity with Nvidia’s deep learning SDKs such as TensorRT.

  • Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

You will also be eligible for equity and .