Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Systems Software Engineer - NIM Factory Platforms 
United States, Texas 
635187867

01.12.2024

What you'll be doing:

  • Develop, analyze and optimize factory infrastructure that will take an AI model in and produce a deployable service that is validated across Cloud, On-prem and Kubernetes environments. With the team, define and deliver rapid iterations on the group's technical strategies and roadmaps to deliver and improve the NIM factory. You will be developing harness, automating hardware acceptance, analyze benchmarks, data gathering and statistical analysis of systems health and performance analysis of NIMs

  • Work with technical leaders designing and developing scalable and reliable factory acceptance and performance tuning of hardware platforms. You will collaborate with multiple AI model teams to understand their requirements to build an efficient infrastructure that improves every team's productivity.

  • You will define metrics and drive improvements based on user feedback. You will mentor and collaborate throughout the team and with other teams to grow your colleagues and yourself. You will have a history of learning and growing your skills and those around you.

What we need to see:

  • A history of using your advanced programming skills to build tooling and automation for hardware system characterization and benchmarking.

  • Proven experience debugging and analyzing performance of compute applications and system

  • Deep technical expertise working with system software and platform layers including Kernel, device driver, memory, storage, networking and PCIe devices

  • Passion for building platform engineering components and automation of system benchmarking and characterization.

  • Excellent interpersonal skills and the ability to lead multi-functional efforts

  • Experience working with hardware clusters, distributed system, networking, GPU interconnects (PCie, NVlink), node and cluster interconnect (Infiniband)

  • BS or MS in Computer Science, Computer Engineering or related field (or equivalent experience)

  • 6+ years of shown experience developing performant microservice, cloud software and/or tooling roles

Ways to stand out from the crowd:

  • Experience delivering optimized system engineering environment for inference applications in data center and consumer grade hardware platforms.

  • A history of building and deploying automated benchmarking solution in Cloud and On-prem environments, and their associated CI/CD pipelines

  • Prior experience in working with large scale compute infrastructure solution

You will also be eligible for equity and .