Share
What you'll be doing:
Lead and manage a high-performing team of software engineers and SRE engineers, guiding their professional growth and project execution while fostering a culture of innovation and excellence.
Oversee factory automation initiatives that streamline the development, deployment, and management of inference microservices across distributed environments.
Coordinate the development of infrastructure that ensures consistency, quality, and security for inference workload deployments at scale.
Collaborate with cross-functional teams to integrate the infrastructure into CI/CD pipelines, enabling seamless and efficient microservices delivery.
Build foundational distributed computing systems supporting the full lifecycle of inference microservices for NVIDIA's AI strategy.
Establish and enforce standards for infrastructure and application deployment, eliminating manual, ad-hoc processes through automation.
Work closely with security teams to ensure the platform's design and implementation are robust and secure, with a focus on authentication, authorization, and data protection.
Drive recruitment and mentorship efforts to build and maintain a top-tier engineering team.
What we need to see:
BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience).
8+ overall years of software engineering experience with a focus on distributed systems, cloud infrastructure, or large-scale platform development.
3+ years of experience leading and managing high-performing engineering teams of software engineers and SRE engineers.
Proven expertise with distributed computing technologies including Kubernetes, container orchestration, and experience with multi-cloud or hybrid-cloud environments.
Strong understanding of microservices architecture** and experience building scalable, fault-tolerant distributed systems.
Experience with factory automation andinfrastructure-as-codeprinciples, including Temporal and automated deployment pipelines.
Excellent communication, leadership, and problem-solving skills with the ability to operate in a fast-paced, collaborative environment.
Proven ability to work effectively in remote and cross-functional teams.
Ways to stand out from the crowd:
Experience building platforms that support the full lifecycle of AI inference applications and microservices.
Deep understanding of inference workloads and their unique infrastructure requirements for low-latency, high-throughput processing.
Experience with NVIDIA hardware including GPUs, DPUs, and networking technologies for AI workloads.
Background in AI/ML infrastructure and understanding of model serving, inference optimization, and GPU utilization.
Experience in a large-scale, high-growth technology company with proven track record of delivering software products.
These jobs might be a good fit