Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior System Software Engineer – DC Platform Tools 
United States, California 
159834498

14.04.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

System Software Engineer– DC Platform SW Tools.NVIDIA Grace and GPU superchips provide performance and productivity requiredfor strong scaling for HPC and generative AI workload. Scale out is inherent to the design of this massive superchip.

Senior System Software EngineerPlatform SoftwareYou willbe responsible fordesign, development,and deploymentdata centerThe primary focus of these toolsis toprovide simple user experience in the data center manageability life cycle from deployment, production,repair workflows.You will work closely with cross-functional teams, including hardware engineers, system architects, software developers, and customersgather requirements, create solutions and provide end-to-endmanageability experience


be doing:

  • Drive next generation GPU Server Software manageability workflows for scaling AI infrastructure for Datacenters. This infrastructure includes DGX, HGX or MGX Products. You will be involved in ensuring proper tools are built for managing Server Software and Firmware for data center lifecycle.

  • Work with internal and external customers to understand requirements for various tools to improve debuggability, serviceability and runtime of data center firmware and software.

  • Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support.

  • Maintain detailed documentation of tool designs, capabilities, and usage guidelines. Provide regular reports and technical insights to internal teams on the effectiveness and improvements of developed tools.

  • Define KPIs for tools and work across various stakeholders to improve it over time.

What weed to see:

  • BS, MS, or PhD in EE/CS or related field of education (or equivalent experience) with 10+ years of experience

  • Proven record of having worked in management solutions for large scale clusters in data centers.

  • Strong and demonstrable skill in Python

  • Experience programming and debugging skills for large scale data centers.

  • Experience in SCM (e.g., Git, Perforce) and project management tools like Jira.

  • Possess excellent written and oral communication skills, excellent work ethics, a deep sense of teamwork, love to produce quality work and commitment to finish your tasks every single day.

  • You are a self-starter who loves to find creative solutions to complicated problems and hands on with coding.

Ways to stand out from the crowd:

  • Worked on data center deployment and management projects.

  • Hands on with x86 or ARM system architecture.

  • Are familiar with processor microarchitecture such as caches, pipelining, memory hierarchy, and instruction set architecture(ISA). Experiencewith code coverage and static analysis tools.

We have some of the most forward-thinking and hardworking people on the planet working for us. If

You will also be eligible for equity and .