Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia Senior HPC Architect Networking 
United States, Texas 
485250759

Today
US, CA, Santa Clara
US, CA, Remote
time type
Full time
posted on
Posted 17 Days Ago
job requisition id

We are looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of large-scale GPU compute clusters. Be a key player to enable the most exciting computing hardware and software and contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on and implement at-scale system administration and tuning mechanisms for large-scale compute runs. You will work with the latest accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.

What we need to see:

  • 5+ years of experience using accelerated computing for datacenter/HPC-based Enterprise computing solutions.

  • Solid understanding of accelerated computing scheduling and I/O stacks.

  • C/C++/Python/Bashprogramming/scriptingexperience.

  • Experience working with engineering or academic research community supporting high performance computing or deep learning.

  • Experience with parallel filesystems.

  • Strong teamwork and communication skills, both verbal and written.

  • Ability to multitask effectively in a dynamic environment.

  • Experience deploying and maintaining high speed networks like Infiniband or Ethernet for compute and storage traffic

  • Desire to be involved in multiple diverse and innovative projects.

  • BS (or equivalent experience) in Engineering, Mathematics, Physics, or Computer Science. MS or PhD desirable.

Ways to stand out from the crowd:

  • Hands-on experience debugging large scale IB/Ethernet/NVLink fabrics

  • Experience with Spectrum-X fabric deployments

  • Deep Learning framework skills.

  • Exposure to using and deploying telemetry and visualization pipelines

  • Exposure to container technology and Linux performance tools.

You will also be eligible for equity and .