What you'll be doing:
Architect developer-focused products that simplify high-performance inference and training deployment across diverse GPU architectures.
Define the multi-year strategy for kernel and communication libraries by analyzing performance bottlenecks in emerging AI workloads.
Collaborate with CUDA kernel engineers to design intuitive, high-level abstractions for memory and distributed execution.
Partner with open-source communities like Triton and FlashInfer to shape and drive ecosystem-wide roadmaps.
What we need to see:
5+ years of technical PM experience shipping developer products for GPU acceleration, with expertise in HPC optimization stacks.
Expert-level understanding of CUDA execution models and multi-GPU protocols, with a proven track record to translate hardware capabilities into software roadmaps.
BS or MS or equivalent experience in Computer Engineering or demonstrated expertise in parallel computing architectures.
Strong technical interpersonal skills with experience communicating complex optimizations to developers and researchers.
Ways to stand out from the crowd:
PhD or equivalent experience in Computer Engineering or a related technical field.
Contributed to performance-critical open-source projects like Triton, FlashAttention, or TVM with measurable adoption impact
Crafted GitHub-first developer tools with >1k stars or similar community engagement metrics
Published research on GPU kernel optimization, collective communication algorithms, or ML model serving architectures
Experience building cost-per-inference models incorporating hardware utilization, energy efficiency, and cluster scaling factors
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך