

What you’ll be doing:
Prototype end-to-end solutions to improve distributed training and disaggregated inference performance.
Analyze and optimize communication flows across application, transport, and network layers.
Develop system software spanning communication libraries, drivers, and firmware integrations.
Collaborate with hardware, firmware, and SDK teams to co-design network features.
Validate and integrate prototypes into NVIDIA’s AI infrastructure and products.
What we need to see:
BSc/MSc/PhD in Computer Science or Electrical Engineering
5+ years of relevant experience and/or knowledge
Deep understanding of networking and communication internals — NCCL, RDMA/RoCE, congestion control.
Hands-on experience with HW/SW/FW integration and low-level programming (C/C++, kernel, drivers).
Some background in distributed training systems (such as PyTorch DDP, Megatron-LM, DeepSpeed).
Ways to stand out from the crowd:
Demonstrated innovation and leadership turning prototypes into impactful product features.
Experience with programmable data planes (P4, eBPF, DOCA SDK, or switch SDKs).
Familiarity with NIC firmware scheduling, in-network compute, or congestion management.
Contributions to open-source projects, academic papers, or performance benchmarking tools.
Strong background in AI factory architectures, distributed inference, or network telemetry.
משרות נוספות שיכולות לעניין אותך