Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia Senior System Software Architect HPC Networking 
China, Beijing, Beijing 
244100533

01.12.2024

What you will be doing:

  • Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), new runtime designs, and new network hardware features.

  • Research, design and implement features for AI and HPC communication middleware (NCCL, UCX, UCC), and Deep Learning frameworks such as TensorFlow/Pytorch.

  • Research, design and develop hardware features relevant to scientific, Deep learning, and data-intensive workloads.

  • Collaborate with customers to understand their needs and provide innovative solutions for them.

What we need to see:

  • Ph.D, Masters, or Bachelors in computer science, computer engineering, electrical engineering or a closely related field.

  • 5+ years of experience in DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training workloads.

  • Deep understanding of parallelism techniques including Data Parallelism, Pipeline Parallelism, Tensor Parallelism, and FSDP.

  • Experience with AI network parallelism using collective libraries and RDMA/RoCE.

  • Background in algorithm design, system programming, and computer architecture.

  • Strong programming and software development skills.

  • Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.

Ways to stand out from the crowd:

  • Deep understanding of technology and passion for what you do.

  • Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.

  • Background with designing communication middleware for high-performance computing systems, including RoCE and DPUs.

  • Background with CUDA programming and NVIDIA GPUs and programming models for emerging architectures.