The point where experts and best companies meet
Share
What you will be doing:
Design, implement and maintain highly-optimized communication runtimes for Deep Learning frameworks (e.g. NCCL for TensorFlow/Pytorch) andHPC programming interfaces (e.g. UCX for MPI/OpenSHMEM) on GPU clusters.
Participating in and contributing to parallel programming interface specifications like MPI/OpenSHMEM.
Design, implement and maintain system software that enables interactions among GPUs and interactions between GPUs and other system components.
Creating proof-of-concepts to evaluate and motivate extensions in programming models, new designs in runtimes and new features in hardware.
What we need to see:
M.S./Ph.D. degree in CS/CE or equivalent experience.
5+ years of relevant experience.
Excellent C/C++ programming and debugging skills.
Strong experience with Linux.
Expert understanding of computer system architecture and operating systems.
Experience with parallel programming interfaces and communication runtimes.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Deep understanding of technology and passionate about what you do.
Experience with CUDA programming and NVIDIA GPUs.
Knowledge of high-performance networks like InfiniBand, iWARP etc.
Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.
You will also be eligible for equity and .
These jobs might be a good fit