Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia Senior DevOps Engineer - AI AV Infrastructure 
China, Shanghai 
329360928

Yesterday
China, Shanghai
China, Beijing
China, Shenzhen
time type
Full time
posted on
Posted 20 Days Ago
job requisition id

This role offers a chance to start from the ground up: standing up new vendor-provided platforms, validating integration paths, and ensuring infrastructure is reliable, secure, and production-ready. You will combine hands-on engineering, infrastructure deployment, and workflow automation to help scale our AV validation ecosystem.

What You’ll Be Doing:

  • Deploy and operationalize vendor-provided platforms in our service cloud, starting with proof-of-concept environments to validate dependencies, workflows, and performance.

  • Build and maintain distributed infrastructure that supports large-scale log ingestion, data processing, and scenario validation at scale.

  • Automate workflows and pipelines using Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution.

  • Integrate simulation and drive logs (e.g., parquet, world model data) with validation platforms, ensuring seamless end-to-end coverage analysis.

  • Provide visualization and reporting capabilities to surface validation results, coverage metrics, and actionable insights for developers and stakeholders.

  • Define and manage access controls, monitoring, and security policies to ensure compliance while enabling smooth collaboration across internal and vendor teams.

  • Partner closely with internal teams and external vendors to troubleshoot issues, refine SLAs, and continuously improve operational reliability and scalability.

What We Need to See:

  • BS/MS in Computer Science or Engineering (or equivalent experience) or BS/MS in STEM related field

  • 5+ years of professional experience in infrastructure, distributed systems, or platform engineering.

  • Hands-on experience with Linux systems, Kubernetes/Docker, and CI/CD pipelines.

  • Strongscripting/developmentskills in Python, Bash, and exposure in C++ and/or GoLang.

  • Familiarity with Bazel build/test automation frameworks.

  • Experience in data/log ingestion workflows and distributed compute/storage systems.

  • Strong debugging, problem-solving, and communication skills to work across internal and vendor teams.

Ways to Stand Out from the Crowd:

  • Prior experience with scenario-based validation platforms or AV simulation ecosystems. Experience with Foretellix is an added advantage.

  • Background in large-scale distributed systems or GPU/CPU cluster deployments. Strong knowledge oflogging/monitoring/alertingframeworks (Prometheus, Grafana, ELK stack, etc.).

  • Experience working directly with external vendors to integrate platforms and operationalize SLAs.

  • Contributions to open-source projects in infrastructure automation, data pipelines, or validation tooling.

  • Proactive use of AI/ML techniques to accelerate log analysis, coverage metrics, or integration workflows.