Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia Senior Engineering Manager – AI Research Clusters 
United States, Texas 
361955560

02.07.2025
US, CA, Santa Clara
US, Remote
time type
Full time
posted on
Posted 7 Days Ago
job requisition id

What you'll be doing:

  • Lead the design and deployment of scalable storage systems optimized for AI workloads and high-performance compute clusters.

  • Drive readiness and operational enablement for upcoming hardware platforms, ensuring seamless integration and performance.

  • Coordinate the development of internal tools to enhance storage provisioning, usage traceability, and user self-service.

  • Guide the evaluation and implementation of new technologies to improve efficiency, reliability, and observability.

  • Collaborate with cross-functional teams to align storage architecture with GPU cluster requirements and evolving research needs.

  • Improve storage monitoring and metrics infrastructure to surface key insights and enable proactive management.

  • Find opportunities to modernize existing storage systems for improved quota management, compression, and automation.

What we need to see:

  • BS or equivalent experience.

  • 12+ overall years of relevant technical experience.

  • 5+ years of leadership experience.

  • Proven ability to lead engineering teams building infrastructure at scale, especially in environments combining storage and high-performance computing.

  • Deep technical knowledge in distributed storage systems, with experience improving data access patterns and platform observability.

  • Familiarity with infrastructure deployment lifecycle – from planning and vendor engagement to rollout and operational readiness.

  • Strong understanding of aligning storage performance with compute needs, and measuring system behavior based on real-world metrics.

  • Ability to guide teams through technology evaluations, balancing technical rigor with speed and pragmatism.

Ways to stand out from the crowd:

  • Experience with large-scale storage and networking systems inperformance-sensitiveenvironments such as HPC, AI, or scientific computing.

  • Success in building tools or automation for self-service, visibility, and governance in complex infrastructure environments.

  • Background in data observability and metrics correlation for infrastructure performance, cost efficiency, or capacity forecasting.

  • Leading teams through cross-functional technical evaluations or RFPs, turning those into successful infrastructure deployments.

  • Contributions to storage architecture improvements, including filesystem tuning, resource quota management, or data compression strategies.

You will also be eligible for equity and .