Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior HPC Engineer Infrastructure Specialist Team 
United States, Texas 
48243077

24.06.2024

c, cand governmentteam that requires excellent interpersonal skills. This role will be interacting with customers,and implement largescale AI/HPC projects.These efforts include a combination of networking, systemdesign and automationand validation.

What you will be doing:

  • Primary responsibilities will include deploying,managing,andvalidatingAI/HPC infrastructure inlinux-based environments for new and existing customers.

  • Be the domain expert with customers during planning calls through implementation.

  • Handover-related documentation and perform knowledge transfersrequiredto support customers as they begin rolling out some of the most sophisticated systems in the world!

  • Provide feedbackto

What we need to see:

  • 5+ years providing in-depth support and deployment services;solving problems for hardware and software products.

  • Knowledge and experience withlinuxsystemadministration, process management, package management, task scheduling, kernel management, bootprocedures/troubleshooting,performancereporting/optimization/logging,network-routing/advancednetworking (tuning and monitoring).

  • Cluster management technologies (bonus credit forBCM (Base Command Manager)).

  • Minimum of a four-year degree from an accredited university or college in Computer Science, or Electrical or Computer Engineeringor equivalent experience.

  • Scriptingproficiency(Bash,Python,Ansible, etc.).

  • Excellentinterpersonal skillsandthe ability to deliver resolutions for customerissues as they arise.

  • Strong organizational skills and ability toprioritize/multi-taskeasily with limited supervision.

  • Experience withschedulers such as SLURM, LSF, UGE, etc.

  • A willingness to travel to customer sites within the United States.

  • Automation tooling background (Ansible, Puppet, etc.).

  • Experience with benchmarking tools such as HPL, NCCL tests, MLPERF.

  • Kubernetes experience.


Ways to stand out from crowd:

  • InfiniBand experience.

  • Experience withGPU (Graphics Processing Unit)focused hardware/software.

  • Experience withMPI (Message Passing Interface).

  • Storage technologies such asLustreor GPFS.

  • Familiarity with Dell and Supermicro GPU platforms

You will also be eligible for equity and .