Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia Principal Software Engineer - DGX Cloud Kubernetes Runtime Team 
United States, Texas 
219837540

10.11.2025
US, WA, Remote
US, CA, Remote
time type
Full time
posted on
Posted 3 Days Ago
job requisition id

What you will be doing:

  • Design and implement the runtime controller system that manages the lifecycle of runtime packages across thousands of Kubernetes clusters without manual pipeline intervention

  • Build and maintain the runtime builder that packages, validates, and distributes GPU operators, DRA drivers, network components, and other accelerated compute runtime packages

  • Develop Kubernetes controllers, CRDs, and operators that automate runtime installation, upgrade, and rollback operations with API driven workflows

  • Create expansion rules and component management systems that enable flexible runtime composition across different cloud providers and GPU architectures

  • Work with internal teams to migrate from GitLab pipeline-based deployments to fully automated, controller powered runtime management

What we need to see:

  • Experience building production Kubernetes systems with deep expertise in controllers, operators, andCustomResourceDefinitions

  • Strong proficiency in Go and experience building scalable Go services that manage complex distributed systems

  • Hands-on experience with Helm, Kustomize, and managing Kubernetes manifest packaging and templating at scale

  • Deep understanding of Kubernetes architecture including API machinery, admission controllers, and resource lifecycle management

  • Demonstrated ability to design and implement automation systems that replace manual processes with reliable, self-service tooling

  • Masters and/or PhD in Computer Science, or equivalent experience

  • 15+ years of professional experience, with at least 4 years experience with Kubernetes development

Ways to stand out from the crowd:

  • Experience building multi-tenant platform services with focus on API design, versioning, and backward compatibility

  • Deep familiarity with OCI registries, artifact signing, SBOM generation, and supply chain security practices

  • Experience working with GPU operators, device plugins, or other hardware acceleration components in Kubernetes

  • Track record of migrating legacy systems to modern, automated platforms while maintaining zero-downtime operations

  • Contributions to upstream Kubernetes projects or experience extending Kubernetes API machinery

You will also be eligible for equity and .