The point where experts and best companies meet
Share
What you will be doing:
Research new communication technologies (e.g. expand the GPUDirect technology portfolio) and design new features for our communication libraries
Propose innovative solutions in HW and SW for our next-gen platforms. You will co-design these solutions with the GPU, Networking, and SW architects and ensure seamless integration with the software stacks
Inspire changes based on quantitative data coming from proof-of-concepts or detailed technical analysis/modeling
Drive the adoption of new communication technologies across application verticals
Keep up with the latest DL research and collaborate with diverse teams (internal and external), including DL researchers, and customers
What we need to see:
PHD in Computer Science, Computer Engineering or related field or strong equivalent experience; 15+ years of relevant experience in academia or the industry
Expert in following areas: HPC, parallel programming models (MPI, SHMEM), at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC), computer and system architecture, GPU architecture and CUDA
Deep understanding of various aspects of high performance networking from prior work experience: network technologies (Infiniband, Ethernet), network design, network topologies, network debug and performance analysis
Strong in at least a few of these areas: ML/DL fundamentals and how they tie to communications, parallel algorithms, fault tolerance and resiliency, competitive assessments, performance analysis and optimizations for parallel applications on large clusters, developing applications using DL Frameworks (PyTorch, TensorFlow)
Programming fluency with C or C++ for systems software development
Flexibility to work and communicate effectively across different HW/SW teams and timezones
Ways to stand out from the crowd:
Industry recognized leader in HPC/DL communications with history of patents, publications and conference talks and keynotes in areas relevant to this role
Influential role in industry standards (e.g. MPI, OpenSHMEM) and open source software (e.g. PyTorch, UCX, Open MPI)
You will also be eligible for equity and .
These jobs might be a good fit