Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Nvidia Principal Engineer Systems Software 
United States, California 
817194785

01.09.2024

What you’ll be doing:

  • Architecting the product to discover cluster resources such as hosts, GPUs, and switches, and automate debug and repair actions on these resources

  • Designing the platform to support GPU clusters across different CSPs and platforms such as Kubernetes and Slurm

  • Developing a distributed workflow execution runtime for parallel and fault tolerant actions on large number of resources

  • Operating critical software services with high availability and reliability for customers

  • Influencing the product roadmap in collaboration with teams across various departments with the goal of reducing SRE toil and improving hardware utilization

  • Optimizing performance of system to increase scalability and improve user experience

  • Leading and delivering high impact projects with high quality, performance and stability with the lowest resource consumption

  • Elevating the productivity and creativity of the technical staff by optimizing engineering practices, guiding junior engineers and providing quality design and code reviews

  • Programming in systems languages like Go and Rust

What we need to see:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience)

  • 15 years of equivalent experience

  • Demonstrated ability in building scalable and robust distributed systems

  • Proven record of product rollouts and collaborating with early adopters

  • Proficiency in programming in Go, Rust, C/C++, or Java

  • Technical stewardship of projects across the organization

Ways to stand out from the crowd:

  • Deep understanding of concurrency and distributed systems concepts

  • Experience with handling large complex systems

  • Experience with SRE, DevOps, and platforms

You will also be eligible for equity and .