Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia Senior Systems Engineer NVIDIA Mission Control 
United States, California 
233536666

Today
US, CA, Santa Clara
time type
Full time
posted on
Posted Yesterday
job requisition id

It offers an enterprise-grade, full-stack solution, optimized for large-scale AI training and inference, and is available through partnerships with leading cloud service providers.

What you'll be doing:

  • Making the existing cluster automation platform more fault-tolerant, agile, hardware/networking aware, and resource-efficient

  • Enabling AI capabilities in the platform to enhance user experience and accelerate automation, and diagnosis and remediation of issues

  • Integrating with the ecosystem tools to enable a rich, unified user experience with full end-to-end capabilities

  • Collaborating with various stakeholders across NVIDIA to understand business context, influence the product roadmap, help with adoption of the automation platform, and reduce toil for managing clusters

  • Operating critical software services with high availability and reliability

  • Programming in systems languages like Rust and Go

  • Driving engineering best practices, mentoring engineers, and fostering an inclusive team culture

What we need to see:

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience)

  • Keen interest in driving Agent AI projects

  • 10 years of equivalent experience

  • Demonstrated ability in building scalable, agile, and robust distributed systems

  • Successful product rollouts and collaboration with early adopters

  • Technical leadership and ownership of projects across the organization

  • Hands-on approach, passion for continuous improvement, and willingness to get involved in all aspects of development

  • Experience working with ambiguity and driving clarity in complex technical decisions

Ways to stand out from the crowd:

  • Skilled in using AI to scale team productivity and agility

  • Experience with revamping complex systems with existing customers to take them to the next level

  • Experience with SRE, DevOps, CI/CD, and a variety of platforms

You will also be eligible for equity and .