Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Tesla Software Engineer Training Performance AI Infrastructure 
United States, California, Palo Alto 
698571949

17.04.2025
What You’ll Do
  • Reduce wall clock time to convergence of our training jobs by identifying bottlenecks in the ML stack, from data-loading up to the GPU
  • ​Integrate efficient, low-level code with the overall high-level training framework
  • ​Profile our workloads and implement solutions to increase training efficiency
  • ​Optimize workloads for efficient hardware utilization (e.g. CPU and GPU compute, data throughput, networking)
What You’ll Bring
  • Members of the Autopilot AI Infrastructure team are expected to be adaptable to the dynamic requirements of AI research and capable of contributing across all parts of the AI training software stack
  • ​Practical experience programming in Python and/or C/C++
  • Experience programming in CUDA, cuDNN or Triton, particularly in the context of operations used in AI workloads
  • ​Experience profiling and optimizing CPU-GPU interactions (pipelining computation with data transfers, etc.)
  • Experience working with training frameworks (ideally PyTorch)
  • ​Proficient in system-level software, in particular hardware-software interactions and resource utilization
  • ​Experience with parallel programming concepts and primitives
  • ​Understanding of modern machine learning concepts and state of the art deep learning
  • ​Experience scaling neural network training jobs across many GPUs