Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Nvidia Senior HPC Engineer - Infrastructure Specialist Team 
United States, Texas 
719030400

31.07.2024

c, cand governmentteam that requires excellent interpersonal skills. This role will be interacting with customers,and implement largescale AI/HPC projects.These efforts include a combination of networking, systemdesign and automationand validation.

What you will be doing:

  • Primary responsibilities will include deploying,managing,andvalidatingAI/HPC infrastructure inlinux-based environments for new and existing customers.

  • Be the domain expert with customers during planning calls through implementation.

  • Handover-related documentation and perform knowledge transfersrequiredto support customers as they begin rolling out some of the most sophisticated systems in the world!

  • Provide feedbackto

What we need to see:

  • 5+ years providing in-depth support and deployment services;solving problems for hardware and software products.

  • Knowledge and experience withlinuxsystemadministration, process management, package management, task scheduling, kernel management, bootprocedures/troubleshooting,performancereporting/optimization/logging,network-routing/advancednetworking (tuning and monitoring).

  • Cluster management technologies (bonus credit forBCM (Base Command Manager)).

  • Minimum of a four-year degree from an accredited university or college in Computer Science, or Electrical or Computer Engineeringor equivalent experience.

  • Scriptingproficiency(Bash,Python,Ansible, etc.).

  • Excellentinterpersonal skillsandthe ability to deliver resolutions for customerissues as they arise.

  • Strong organizational skills and ability toprioritize/multi-taskeasily with limited supervision.

  • Experience withschedulers such as SLURM, LSF, UGE, etc.

  • A willingness to travel to customer sites within the United States.

  • Automation tooling background (Ansible, Puppet, etc.).

  • Experience with benchmarking tools such as HPL, NCCL tests, MLPERF.

  • Kubernetes experience.


Ways to stand out from crowd:

  • InfiniBand experience.

  • Experience withGPU (Graphics Processing Unit)focused hardware/software.

  • Experience withMPI (Message Passing Interface).

  • Storage technologies such asLustreor GPFS.

  • Familiarity with Dell and Supermicro GPU platforms

You will also be eligible for equity and .