The point where experts and best companies meet
Share
hat you will be doing:
Collaborate with multi-functional teams to analyze, co-design, and develop networking software and hardware for innovative AI platforms.
Drive the development of new networking algorithms and protocols for point-to-point and collective operations at scale.
Identify bottlenecks and inefficiencies in application code, proposing optimizations to enhance performance and network utilization.
Design and implement performance benchmarks and testing methodologies to evaluate performance at scale.
Provide guidance and recommendations for optimizing AI applications for speed, scalability, and resource efficiency.
Share knowledge with domain expert teams as they develop applications for the next generation of AI platforms.
Contribute to the development of tools and frameworks to facilitate network optimization.
What We Need to See:
PhD in Computer Science, Computer Engineering, or related field, or equivalent experience
10+ years of experience with a focus on high-performance networking and AI applications
Expertise in RDMA networking (InfiniBand, ROCE), Ethernet, and PCIe.
Experience with at least one high-performance networking library: NCCL, UCX, libfabric, MPI, UCC.
Deep understanding of various aspects of high-performance networking, including network technologies, debugging, and performance analysis.
Experience in developing and optimizing deep learning frameworks such as PyTorch and TensorFlow.
Proficiency in Python and C/C++.
Experience in CUDA programming.
Track record of delivering performance improvements for software used in large-scale deployments.
Knowledge of Kubernetes (k8s) and cloud-native application principles is a plus.
Familiarity with continuous integration and delivery practices for performance optimization.
Ways To stand out from the crowd:
Hands-on experience in optimizing networking building blocks for DL frameworks like PyTorch and TensorFlow.
Experience in developing communication libraries such as NCCL, UCX, UCC, MPI.
In-depth knowledge of RDMA, GPU-Direct, and network technologies.
Provide references to your code contributions.
You will also be eligible for equity and .
These jobs might be a good fit