Learn about GPU communication runtimes for Deep Learning frameworks (e.g. NCCL for TensorFlow/Pytorch) and HPC programming interfaces (e.g. NVSHMEM, UCX for MPI/OpenSHMEM) on GPU clusters.
In-depth performance characterization of one of these runtimes to identify new features
Enhance the existing CI/CD infrastructure
Design, implement and maintain system software that enables interactions among GPUs and interactions between GPUs and other system components.
Create proof-of-concepts to evaluate and motivate extensions in programming models, new designs in runtimes and new features in hardware.
What we need to see:
Pursuing M.S./Ph.D. degree in CS/CE.
Excellent C/C++ programming and debugging skills.
Strong experience with Linux.
Expert understanding of computer system architecture and operating systems.
Experience with parallel programming interfaces and communication runtimes.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Deep understanding of technology and passionate about what you do.
Experience with CUDA programming and NVIDIA GPUs.
Knowledge of high-performance networks like InfiniBand, iWARP etc.
Experience with HPC applications.
Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc.
Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.