What you'll be doing:
Development of Kubernetes integration in our Linux-based cluster management software product. You will allow customers to set up, manage and monitor Kubernetes deployments on their BCM clusters.
Integrating other NVIDIA components into Base Command Manager.
Ensuring that various types of workload can easily utilize GPUs through Kubernetes or other workload management systems such as Slurm.
Development of various Kubernetes operators to facilitate different types of workload in Kubernetes.
Following the latest developments in the area of Kubernetes.
Assisting the support team with Kubernetes specific support tickets that require specific expertise.
Working with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects such as InfiniBand and Spectrum X) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, BeeGFS, WekaIO), Jupyter, various ML frameworks and tools, and Ceph.
What we need to see:
Degree in Computer Science or related field.
Fluency in C++ and/or Python
Experience with concurrent programming techniques
7+ years of relevant experience, ideally in the area of systems programming
In-depth knowledge of Linux and Kubernetes
Ways to stand out from the crowd:
Experience with high-performance computing and system administration would be an asset
Experience with Slurm
Background with GoLang would be beneficial
משרות נוספות שיכולות לעניין אותך