Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia Software Engineer Fleet Health Instrumentation Intern - Fall 
United States, California 
127719794

20.05.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 3 Days Ago
job requisition id

What you will do:

  • Design and build softwarethat collects, transforms, and publishes health data about our global GPU fleet.

  • Develop micro-services and data pipelinesin Go or Python that ingest and normalize data from many diverse sources—routing millions of records per day (Kafka, Airflow, Kinesis).

  • Instrument production infrastructure and workloadsrunning on Kubernetes and bare-metal clusters; add tracing and metrics hooks for deeper insights.

  • Automate deployments and testingwith CI/CD (GitLab, Argo) and IaC (Terraform), ensuring repeatable, low-touch releases.

  • Participate in the full lifecycle of cloud services—from design docs and code reviews through deployment, monitoring, and continuous improvement.

  • Collaborate with other engineersto debug live issues and turn post-incident insights into durable code fixes.

  • Contribute to internal toolingand dashboards that help engineers visualize fleet health, utilization, and capacity trends.

What we need to see:

  • Actively pursuing aBS or MSin Computer Science, Computer Engineering, or a closely related quantitative field (e.g., Physics or Mathematics).

  • Solid understanding ofdistributed‑systems fundamentals, modern software‑engineering practices, and data‑modeling principles.

  • Proficiency in at least one programming language—preferablyPython or Go.

  • Working knowledge ofLinux, basic networking concepts, andKubernetescontainer orchestration.


Ways to stand out from the crowd:

  • systematic, analytical problem‑solving approachpaired with clear written and verbal communication skills and a strong sense of ownership.

  • Demonstrated ability todebug, optimize, and automatecode or workflows with minimal guidance.

  • Hands‑on experiencebuilding, deploying, and operating servicesin a public‑cloud or large on‑prem environment.

You will also be eligible for Intern