What You’ll Be Doing:
Define, develop, and execute cutting-edge benchmarks and workloads to analyze system performance, identify bottlenecks, and drive optimizations across our hardware and software stack.
Drive the direction of our future products by performing deep-dive analysis of system architectures and solutions to assess their performance, efficiency, and value proposition.
Develop and validate sophisticated performance and network simulation models, correlating them with real-world hardware to predict and analyze the behavior of future systems.
Analyze and optimize the entire AI stack, including communication libraries (like NCCL) and system software to the underlying network fabric, developing Proof-of-Concepts (POCs) for new features and improvements.
Conceptualize next-generation networking architectures driven by emerging DL and AI technologies.
Collaborate with multi-functional teams, including other architecture teams, logic design, system software, firmware, and DL research teams, to ensure the successful execution of our vision.
What We Need To See:
M.Sc. or Ph.D. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.
6+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.
Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.
Proven experience in simulative performance analysis or benchmarking.
Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to translate complex technical data into strategic architectural insights.
Hands-on programming skills in Python and/or AI frameworks for system analysis, automation, and modeling.
Ability to thrive in a fast-paced, dynamic environment and work concurrently with multiple groups across the organization.
Ways To Stand Out From The Crowd:
Expertise in the architecture and system-level requirements of large-scale, distributed DL workloads (e.g., LLMs, Generative AI for vision).
Deep understanding of communication libraries such as NCCL, UCX, or UCC.
Expertise in network protocols (Ethernet, InfiniBand, RoCE) and large-scale network topologies.
Experience with industry-standard AI benchmarks (e.g., MLPerf) and NVIDIA's frameworks (e.g., NeMo) on large-scale clusters.
משרות נוספות שיכולות לעניין אותך