Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia Deep Learning Engineer - Distributed Task-Based Backends 
United States, Texas 
792084289

01.09.2024

What You Will Be Doing:

  • Develop extensions to popular Deep Learning frameworks, that enable easy experimentation with various parallelization strategies!

  • Develop compiler optimizations and parallelization heuristics to improve the performance of AI models at extreme scales

  • Develop tools that enable performance debugging of AI models at large scales

  • Study and tune Deep Learning training workloads at large scale, including important enterprise and academic models

  • Support enterprise customers and partners to scale novel models using our platform

  • Collaborate with Deep Learning software and hardware teams across NVIDIA, to drive development of future Deep Learning libraries

  • Contribute to the development of runtime systems that underlay the foundation of all distributed GPU computing at NVIDIA

What We Need To See:

  • BS, MS or PhD degree in Computer Science, Electrical Engineering or related field (or equivalent experience)

  • 5+ years of relevant industry experience or equivalent academic experience after BS

  • Proficient with Python and C++ programming

  • Strong background with parallel and distributed programming, preferably on GPUs

  • Hands-on development skills using Machine Learning frameworks (e.g. PyTorch, TensorFlow, Jax, MXNet, scikit-learn etc.)

  • Understanding of Deep Learning training in distributed contexts (multi-GPU, multi-node)

Ways To Stand Out From The Crowd:

  • Experience with deep-learning compiler stacks such as XLA, MLIR, Torch Dynamo

  • Background in performance analysis, profiling and tuning of HPC/AI workloads

  • Experience with CUDA programming and GPU performance optimization

  • Background with tasking or asynchronous runtimes, especially data-centric initiatives such as Legion

  • Experience building, debugging, profiling and optimizing multi-node applications, on supercomputers or the cloud

You will also be eligible for equity and .