המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Nvidia Software Manager AI Infrastructure System
United States, California
235489708

08.07.2025

שיתוף

התחבר/י כדי להגיש מועמדות

US, CA, Santa Clara

time type: Full time

posted on: Posted 6 Days Ago

job requisition id

looking forn AI Infrastructure System Software Managercontinuously working to provide better tools to build and manage this id systemthe abiy tot out long termmaintenance strategy.

be doing:

Mentor, grow, and develop a world-class team of AI infrastructure engineers.
Work across several teams and orgs to build products that use LLMs and agent systems to serve the needs of NVIDIA engineering teams. In that role, you will be collaborating with research and infra teams and serve a large user base (hardware/software teams across NVIDIA).
Align priorities across collaborators and define metrics for measuring the success of the product/team.
Develop and execute strategies for scalable, reliable, and secure AI infrastructure supporting both research and productionworkloads.
Ensure robust monitoring, logging, visualization, and alerting capabilities to guarantee promised uptime and operational excellence.
Architect, design, develop, and maintain infrastructure and large-scale applications for LLM-based solutions. Optimize these systems for performance, scalability, reliability, and secure data management.
Stay updated with the latest trends in AI, ML, and infrastructure, proactively seeking opportunities to integrate advancements into Nvidia’s LLM and AI infrastructure solutions.

What we need to see:

10+ overall years of industry large distributed system software development experience.
BS+ degree in CS or related/equivalent experience.
5+ years of experience managing of AI and SW development teams.
Familiarity with modern software development stacks and tools, including containerization, cloud or on-premises deployments, API integration for seamless model operation, and real-time processingframeworks.
Experience in developing and maintaining LLM or GenAIinfrastructure
Excellent communication, collaboration and problem-solving skills, with a dedication to encouraging an inclusive and diverseworkplace.
Hands-on experience developing large-scale distributed systems

Ways to stand out from the crowd:

Strong technical background in cloud/distributed infrastructure
Experience debugging functional and performance issues in HPC GPU clusters
Background in running and instrumenting distributed LLM training on a multi GPU HPC cluster
Experience with HPC schedulers such as Slurm

You will also be eligible for equity and .

משרות נוספות שיכולות לעניין אותך

Nvidia Senior System Software Engineer - Infrastructure United States, Texas

KLA

KLA AI Infrastructure Software Engineer United States, California, Milpitas

Apple System Infrastructure Developer United States, California, Cupertino

Apple System Infrastructure Developer United States, West Virginia

כלי לבניית קורות חיים מקצועיים מבית אקספוינט

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח

Nvidia Software Manager AI Infrastructure System United States, California 235489708

Nvidia Senior System Software Engineer - Infrastructure United States, Texas

KLA AI Infrastructure Software Engineer United States, California, Milpitas

Apple System Infrastructure Developer United States, California, Cupertino

Apple System Infrastructure Developer United States, West Virginia

Nvidia Software Manager AI Infrastructure System
United States, California
235489708