Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia AI HPC Cluster Group Manager 
Israel, North District 
292455287

24.06.2024

AI & HPC Clusters's group manager

What you’ll be doing:

  • Lead a group that is responsible for building, managing, and maintaining SW R&D clusters composed of Linux, Windows, and VMware systems, x86 and ARM CPU, GPU, Ethernet, and InfiniBand technologies.

  • Work closely with the engineering and architecture teams to understand, plan and build new clusters for validating and testing new NVIDIA Networking technology solutions.

  • Drive the design and implementation of automatic systems to deploy, configure, maintain, and monitor these clusters.

  • Drive the design and implementation of resource management systems for multiuser environments with different needs on these clusters.

  • Manage R&D lab including inventory, power, space, and cooling.

  • Build, expand, and mentor the team to address growing demands and requirements.

  • Innovate! Influence on NVIDIA Networking cluster management tools to shine in customer’s view.

What We Need to See:

  • A degree in Computer Science, Engineering, or a related field.

  • 5+ years of managerial experience including managers’ management.

  • 10+ years of relevant overall professional experience

  • Experience in Data center management from a multidisciplinary company, including handling power, cooling, and space.

  • Experience in managing HPC/AI clusters.

  • Deep understanding of operating systems, computer networks, and high-performance hardware

  • Deep knowledge of distributed resource scheduling systems and orchestration tools such as Slurm, K8s

  • Strong organizational and project management skills, comfortable with multitasking in a dynamic environment with shifting priorities and changing requirements.

  • Enthusiastic and ambitious personality, encouraging a positive and productive work environment.

Ways to Stand Out From the Crowd:

  • Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high-speed interconnects and supporting software

  • Familiarity with CUDA and managing GPU-accelerated computing systems

  • Experience and knowledge of InfiniBand