Share
What we need to see:
Strong foundational expertise and a BS, MS, or PhD in Engineering, Computer Science, or a related field (or equivalent experience).
Established track record working with AI and HPC clusters, both on-premises and cloud based.
4 plus years of proven experience with cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
Hands-on experience with network, storage, cluster configuration and debugging.
Strong analytical and problem-solving skills, along with an ability to articulate what you know to others.
Ability to multitask efficiently in a dynamic environment.
Ways to stand out from the crowd:
Strong coding and debugging skills, including experience with Python, C/C++, Bash, and Linux utilities.
Demonstrated expertise through projects or Open Source contributions involving GPU workloads, Kubernetes, InfiniBand, Ethernet, or other areas related to high-performance clusters and hybrid cloud solutions.
Exhibit hands on experience with NVIDIA AI Enterprise, Base Command Manager, Run:ai and NVIDIA NIMs.
Willingness and ability to learn quickly and solve advanced problems.
You will also be eligible for equity and .
These jobs might be a good fit