Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia AI Network System Architect 
Israel, North District 
22635993

02.05.2024
What You’ll Be Doing:
  • Investigating emerging technologies and methodologies in ML and AI to discern their interactions with network infrastructure.

  • Executing workloads on AI systems, conducting profiling, and analyzing bottlenecks and possible enhancements.

  • Conducting research and implementing optimizations for communication libraries like NCCL and UCX.

  • Spearheading the conceptualization of next-generation networking products tailored to support and accelerate state-of-the-art ML workloads.

  • Develop models for simulations, analyze simulation results, and develop optimization algorithms.

  • Collaborate with multi-functional teams, including other architecture teams, logic design, system software, firmware, and ML research teams, to ensure the successful execution of the project.

What We Need To See:
  • M.Sc, or Ph. D degree in Computer Science, Computer Engineering, or Electrical Engineering.

  • At least 2+ years of industry or research experience in computer networks.

  • Extensive expertise in ML/AI workloads, particularly in distributed training.

  • Excellent understanding of large-scale network behavior and the effect of distributed computing workloads on the network.

  • Experience in the development of simulation environments.

  • Great problem-solving and critical-thinking skills.

  • Ability to thrive in a fast-paced and dynamic environment is necessary.

  • Ability to work concurrently with multiple groups in the organization.

Ways To Stand Out Of The Crowd:
  • Knowledge of communication libraries such as NCCL, UCX, and UCC.

  • Good knowledge of network protocols - such as InfiniBand, IP, TCP, RoCE, and network topologies.

  • Experience with Python, C++, and dockers.

  • Expertise in system engineering, operations research, and intricate hardware-software integrated systems.

  • Demonstrated experience in DLRM, LLM or other generative AI.