

What you will be doing:
Analyze state-of-the-art AI models, identifying key performance bottlenecks and opportunities at the kernel level.
Develop, optimize, and evaluate both hand-tuned and compiler-generated kernels for inference workloads, balancing speed and flexibility.
Design and build high-level DSLs and innovative compiler infrastructure to increase kernel developer productivity while achieving near peak performance.
Collaborate with model AI inference and compiler teams to iterate on kernel fusion, auto tuning, and sophisticated GPU programming techniques.
Benchmark performance across real workloads, diagnose root causes, and rapidly deploy optimizations that maximize hardware utilization on NVIDIA platforms.
What we need to see:
Bachelor’s, master’s or PhD degree in Computer Science, Computer Engineering or related field, or equivalent experience.
At least 3+ years Strong C++ and/or Python programming skills for system and performance engineering.
Understanding of GPU architecture and proficiency in CUDA programming.
Intellectual curiosity and interest to solve exciting problems and deliver practical results in production environments.
Ways to stand out from the crowd:
Experience designing, developing and optimizing high-efficiency GPU kernels for modern AI workloads.
Experience building compilers, domain-specific languages, or automatic optimization systems
Familiarity with popular compiler, GPU programming and AI frameworks such as MLIR, LLVM, PyTorch, XLA, Triton or Cutlass.
Experience with AI/ML inference workloads and model performance analysis.
Strong communication skills and ability to collaborate in a cross-team environment.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך