Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Nvidia Technical Program Manager - Scale Engineering 
United States, Texas 
890097400

12.05.2024

NVIDIA is looking for a highly-motivated Technical Program Manager (TPM) to join our Applied Systems Engineering Team to drive the design process for the next generation of NVIDIA AI supercomputing systems. The TPM plays a crucial role to define requirements and trade-offs in the design of the latest AI systems at scale, focusing on all layers of the stack from the datacenter and network architecture, through the hardware design and systems software.

What you’ll be doing:

  • Collaborate with outstanding engineers and architects to build and deploy large scale GPU computing systems based on NVIDIA's reference supercomputing architectures

  • Define key product requirements and specifications to drive collaboration with architecture leads, systems engineers, and other program managers

  • Track the development of upcoming server, networking, and storage technologies across multiple product roadmaps to feed into integrated datacenter systems

  • Coordinate programs for designing new cluster architectures, adapting them to changing market requirements, and translating those designs into deployed computing systems for production use

  • Document system designs to facilitate the teamwork of multiple engineering groups working on datacenter deployments at scale

  • Communicate internally with engineering leadership to prioritize and address key issues essential to the success of our largest customers

What we need to see:

  • BS (Masters preferred) in Applied Science or Engineering (or equivalent experience)

  • 5+ years of overall experience

  • Experience with accelerated computing systems, high-performance computing, and Linux-based operating systems

  • A passion for understanding challenging technical problems and driving the process of finding a solution

  • Strong teamwork and interpersonal skills, to facilitate building a collaborative workflow for coordination between many teams

Ways to stand out from the crowd:

  • Understanding of datacenter design, including familiarity with power and cooling technologies

  • Experience building and using large-scale cloud computing systems

  • Experience working with the engineering or academic research community supporting high-performance computing or deep learning

You will also be eligible for equity and .