forward-thinking HPC and AI Inference Software Architectto help shape the future of scalable AIinfrastructure—focusingon distributed training, real-time inference, and communication optimization across large-scale systems.
What you will be doing:
Design and prototype scalable software systems that optimize distributed AI training and inference—focusing on throughput, latency, and memory efficiency.
Develop and evaluate enhancements to communication libraries such asNCCL,UCX, andUCC, tailored to the unique demands of deep learning workloads.
Collaborate with AI framework teams (e.g., TensorFlow,PyTorch, JAX) to improve integration, performance, and reliability of communication backends.
Co-design hardware features (e.g., in GPUs, DPUs, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving.
Contribute to the evolution of runtime systems, communication libraries, and AI-specific protocol layers.
Collaborate with customers to understand their needs and provide innovative solutions for them.
What we need to see:
Ph.D, Masters, or Bachelors in computer science, computer engineering, electrical engineering or a closely related field.
5+ years of experience in DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training workloads.
Deep understanding of Inference and Training workloads and optimizations, like Prefill/Decode, data parallelism, Tensor parallelism, FDSP, etc...
Experience with AI network parallelism using collective libraries and RDMA/RoCE.
Background in algorithm design, system programming, and computer architecture.
Strong programming and software development skills.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Deep understanding of technology and passion for what you do.
Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.
Background with designing communication middleware for high-performance computing systems, including RoCE and DPUs.
Background with CUDA programming and NVIDIA GPUs and programming models for emerging architectures.
משרות נוספות שיכולות לעניין אותך