מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Engineer Cloud-Native Stack – CSP Engagements
United States, Texas
581506607

12.08.2025

שיתוף

התחבר/י כדי להגיש מועמדות

US, CA, Santa Clara

US, TX, Austin

US, WA, Redmond

US, WA, Seattle

time type: Full time

posted on: Posted 4 Days Ago

job requisition id

We are developing advanced multi-rack, multi-tenant AI/ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team to focus on the cloud-native stack for datacenter products like GB200. In this role, You will define customer workflows, prototype stack enhancements, and debug the toughest Kubernetes + Slurm issues in multi-rack, multi-tenant AI datacenters. You'll tackle complex scheduling challenges across racks, tenants, and clouds as part of the CSP engagements team.

What you’ll be doing:

Perform deep-dive debugging of multi-rack, multi-tenant clusters: scheduler behavior, container runtime issues, device-plugin crashes, RDMA/IB fabric anomalies, etc.
Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities.
Drive joint architecture reviews and “whiteboard” sessions with CSP and internal platform teams; convert findings into RFCs and upstream pull requests.
Create reproducible testbeds(Helm/Ansible/Terraform)
Deliver technical collateral-design docs, how-to guides, demo scripts-and present at customer on-sites, KubeCon, and SlurmUG.
Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation.

What we need to see:

Strong source-level expertise in Kubernetes internals (scheduler, CRI/CNI/CSI, operators) and Slurm (federation, power-save, plugins).
Hands-on experience integrating next-gen GPUs(Blackwell/GB200/GB300)or comparable accelerators into containerized clusters.
Proven track record debugging large-scale, cloud-native stacks across networking (RDMA/RoCE), storage, and control planes.
Customer-facing engineering or solutions-architect background: requirements gathering, PoC ownership, roadmap influence.
Familiarity with CI/CD (GitHub Actions, Tekton), observability (Prometheus, OpenTelemetry), andinfrastructure-as-code.
Excellent communication-able to switch between deep technical detail and high-level business impact.
6+ years of professional software development experience in distributed systems (Go, Rust, C/C++ or Python for tooling).
BS or MS (or equivalent experience) in Computer Engineering, Computer Science, or related field.

Ways to stand out from the crowd:

Upstream contributions to Kubernetes, Slurm, Volcano, or similar projects.
Experience with GPU computing (CUDA), deep learning workloads

You will also be eligible for equity and .

פרטי המשרה המלאים

משרות נוספות שיכולות לעניין אותך

Nvidia Senior Software Engineer – CSP Engagements United States, Texas

Nvidia Principal Software Engineer – CSP Engagements United States, California

Nvidia Senior Firmware Engineer – CSP Engagements United States, California

Nvidia Software Engineering Intern CSP Engagements - Fall United States, California

כלי לבניית קורות חיים מקצועיים מבית אקספוינט

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח

Nvidia Senior Software Engineer Cloud-Native Stack – CSP Engagements United States, Texas 581506607

Nvidia Senior Software Engineer – CSP Engagements United States, Texas

Nvidia Principal Software Engineer – CSP Engagements United States, California

Nvidia Senior Firmware Engineer – CSP Engagements United States, California

Nvidia Software Engineering Intern CSP Engagements - Fall United States, California

Nvidia Senior Software Engineer Cloud-Native Stack – CSP Engagements
United States, Texas
581506607