Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia Senior Manager CSP Engagements – System Software SWAT Team 
United States, Texas 
28663610

10.11.2025
US, CA, Santa Clara
US, Remote
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you’ll be doing:

  • Lead a cross-functional SWAT team focused on rapid triage, debugging, and resolution of complex system software issues for hyperscaler customers.

  • Drive technical incident response, war-room operations, and escalation management across firmware, Linux kernel, drivers, networking, virtualization, and observability layers.

  • Build and mentor a high-performing team of senior engineers; set operational standards for incident response, on-call rotations, and continuous improvement.

  • Serve as a primary technical and operational focal point for hyperscaler customers, managing expectations, communications, and participant relationships.

  • Collaborate with CSP technical leads, TPMs, and internal engineering teams to deliver customer-validated solutions and influence product quality and release criteria.

  • Operate customer-like labs to reproduce issues, validate fixes, and ensure robust telemetry and observability.

  • Provide executive-level status updates, risk assessments, and recommendations for critical customer issues.

What we need to see:

  • 12+ overall years of proven experience in system software (firmware, Linux kernel, drivers, networking, virtualization), with at least 5 years in data center or HPC software environments.

  • Bachelor's degree or equivalent experience.

  • Minimum 3+ years of direct experience working with hyperscalers in production environments.

  • 6+ yrs of experience in management.

  • Proven leadership in managing customer escalations, technical incident response, and cross-functional teams.

  • Deep technical expertise in Linux kernel, device drivers, ARM (aarch64) & x86, OpenBMC/SBIOS, out-of-band/in-band management, DMTF protocols (Redfish, PLDM, MCTP, SPDM), and networking (TCP/IP, Ethernet, InfiniBand).

  • Strong customer management and team member engagement skills; ability to communicate complex technical issues to executive and engineering audiences.

  • Demonstrated success in reducing time-to-mitigation, improving release predictability, and driving continuous improvement in technical operations.

Ways to stand out from the crowd:

  • Experience building and operating customer-like labs, automation, and telemetry frameworks.

  • Familiarity with GPU computing (CUDA), large-scale AI/HPC workloads, NVLink, Grace, and cluster-leveldeployment/management.

  • Knowledge of CXL/memory fabric fundamentals and contributions to industry standards (OCP, DMTF).

You will also be eligible for equity and .