המקום בו המומחים והחברות הטובות ביותר נפגשים

Tesla Software Engineer Training Performance AI Infrastructure
United States, California, Palo Alto
698571949

17.04.2025

שיתוף

What You’ll Do

Reduce wall clock time to convergence of our training jobs by identifying bottlenecks in the ML stack, from data-loading up to the GPU
Integrate efficient, low-level code with the overall high-level training framework
Profile our workloads and implement solutions to increase training efficiency
Optimize workloads for efficient hardware utilization (e.g. CPU and GPU compute, data throughput, networking)

What You’ll Bring

Members of the Autopilot AI Infrastructure team are expected to be adaptable to the dynamic requirements of AI research and capable of contributing across all parts of the AI training software stack
Practical experience programming in Python and/or C/C++
Experience programming in CUDA, cuDNN or Triton, particularly in the context of operations used in AI workloads
Experience profiling and optimizing CPU-GPU interactions (pipelining computation with data transfers, etc.)
Experience working with training frameworks (ideally PyTorch)
Proficient in system-level software, in particular hardware-software interactions and resource utilization
Experience with parallel programming concepts and primitives
Understanding of modern machine learning concepts and state of the art deep learning
Experience scaling neural network training jobs across many GPUs

משרות נוספות שיכולות לעניין אותך

Tesla Software Engineer Distributed Training AI Infrastructure United States, California, Palo Alto

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח