

Share
What you'll be doing:
Serve as the primary, high-impact contributor on complex features. Dedicate significant time to producing production code across the full stack, including UI, APIs, services, and infrastructure.
Code Review Leadership & Quality Assurance: Lead the code review process, setting and implementing thorough coding standards, performance benchmarks, and architectural integrity to ensure all merged code is high-quality, maintainable, and robust.
Architectural Ownership & Portability: Define and own the long-term technical roadmap, architecture, and design. This includes the required assurance that the deployment pipelines and services are platform-agnostic and easily deployable across the broader NVIDIA ecosystem, deliberately avoiding internal infrastructure dependencies.
Foundation Model Deployment Strategy: Lead the strategic implementation of web services and efficient batch processing queues to seamlessly integrate and operationalize our world foundation models into the customer-facing platform.
System Performance & Reliability: Implement and make sure standards for production-grade performance, monitoring, and fault tolerance across all services. Proactively identify and resolve systemic technical debt and scalability bottlenecks.
Deployment & Operational Excellence: Take ultimate ownership of the CI/CD pipelines, container orchestration strategy (Kubernetes/Helm), and operational readiness, ensuring seamless scalability and reliability in production.
Team Mentorship & Guidance: Mentor and guide the engineering team on advanced practices in full-stack development, distributed systems design, performance optimization, and clean, portable code architecture.
Multi-functional Partnership: Act as the key technical liaison, translating complex requirements from Product Managers, ML Engineers, and Data Scientists into robust, portable, and implementable designs.
What we need to see:
This role requires a proven track record of significant experience and technical mastery:
Minimum 12+ years of hands-on experience developing and deploying scalable full-stack web services in a cloud environment.
Proven Tech Lead or equivalent Senior/Staff level experience with demonstrated ability to define system architecture, mentor engineers, and take end-to-end technical ownership of a major platform while remaining deeply active in coding and code reviews.
Expert-level proficiency in designing and scaling distributed microservices architectures using gRPC and REST APIs.
Deep expertise in modern frontend frameworks and building highly responsive, data-intensive UIs capable of managing high-frequency data flows.
Direct experience designing and deploying containerized applications that use a GPU (e.g., NVIDIA Container Toolkit).
Experience with MaaS (Model-as-a-Service) patterns and serving large machine learning models as high-throughput endpoints.
Mastery of container orchestration, including Kubernetes and Helm for sophisticated, portable, multi-service production deployments.
Proficiency in backend languages such as Python and/or Go, and TypeScript for the frontend.
Strong practical experience with Cloud Infrastructure (AWS S3) and running complex data storage/access patterns (SQL, key-value stores).
Expertise in CI/CD practices (GitLab, Jenkins) with a focus on automation, testing, and improving deployment velocity and stability.
Bachelor's degree (B.S.) or equivalent experience in Computer Science, Software Engineering, Electrical Engineering, or a closely related technical field; Master's degree (M.S.) preferred
Ways to stand out from the crowd:
These skills represent a strong alignment with our specific domain challenges:
Experience in data querying platforms such as Apache Druid, ClickHouse, or Elasticsearch.
Familiarity with autonomous vehicle simulation environments (e.g., Carla) and synthetic data generation pipelines using foundational models.
You will also be eligible for equity and .
These jobs might be a good fit