Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Nvidia Senior Network Software Engineer 
United States, Texas 
961985739

24.06.2024

hat you will be doing:

  • Collaborate with multi-functional teams to analyze, co-design, and develop networking software and hardware for innovative AI platforms.

  • Drive the development of new networking algorithms and protocols for point-to-point and collective operations at scale.

  • Identify bottlenecks and inefficiencies in application code, proposing optimizations to enhance performance and network utilization.

  • Design and implement performance benchmarks and testing methodologies to evaluate performance at scale.

  • Provide guidance and recommendations for optimizing AI applications for speed, scalability, and resource efficiency.

  • Share knowledge with domain expert teams as they develop applications for the next generation of AI platforms.

  • Contribute to the development of tools and frameworks to facilitate network optimization.

What We Need to See​:

  • PhD in Computer Science, Computer Engineering, or related field, or equivalent experience

  • 10+ years of experience with a focus on high-performance networking and AI applications

  • Expertise in RDMA networking (InfiniBand, ROCE), Ethernet, and PCIe.

  • Experience with at least one high-performance networking library: NCCL, UCX, libfabric, MPI, UCC.

  • Deep understanding of various aspects of high-performance networking, including network technologies, debugging, and performance analysis.

  • Experience in developing and optimizing deep learning frameworks such as PyTorch and TensorFlow.

  • Proficiency in Python and C/C++.

  • Experience in CUDA programming.

  • Track record of delivering performance improvements for software used in large-scale deployments.

  • Knowledge of Kubernetes (k8s) and cloud-native application principles is a plus.

  • Familiarity with continuous integration and delivery practices for performance optimization.

Ways To stand out from the crowd:

  • Hands-on experience in optimizing networking building blocks for DL frameworks like PyTorch and TensorFlow.

  • Experience in developing communication libraries such as NCCL, UCX, UCC, MPI.

  • In-depth knowledge of RDMA, GPU-Direct, and network technologies.

  • Provide references to your code contributions.

You will also be eligible for equity and .