What you'll be doing:
Design and implement Pythonic language interface for tile-aware GPU programming
Optimizing compiler pipelines for efficient execution
Integrate with AI/ML frameworks
Develop performance critical primitives for tensor operations and memory operations
Collaborate with hardware teams to co-design compiler optimizations for emerging GPU architectures, including Tensor Core utilization and distributed execution
What we need to see:
Bachelor's degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); MS or PhD are preferred
5+ years (academic/ industry) experience with ML/DL systems development preferable for compilers
Strong Python and C/C++ programming skills
Expert experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, Triton, etc.)
Strong sense of ownership, fast learner, passion for quality and user experience
Ways To Stand Out From The Crowd:
Proficiency in Python and C++ for DSL development and low-level compiler internals
Experience with compiler frameworks (Triton, LLVM, MLIR,TVM) and IR design for GPUs
Deep understanding of GPU architectures (CUDA cores, Tensor Cores, memory hierarchies) and tile-based execution models
Strong experience machine learning systems research and productionization
Open source project ownership or contributions
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך