Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Tesla Software Engineer Model Scaling Autopilot AI 
United States, California, Palo Alto 
538930357

17.04.2025
What You’ll Do
  • Design and Implement Large-Scale Data Pipelines: Build and maintain robust data processing pipelines that handle petabytes of autonomous vehicle data, including images, videos, and auto-generated labels, ensuring scalability and reliability

  • Optimize Neural Network Training Processes: Support neural network training by optimizing code and data formats for faster data loading, orchestrating auto-labeling jobs, and debugging bottlenecks to enhance overall training efficiency

  • Enhance System Performance: Develop and implement automation, monitoring, and optimization tools to improve the efficiency of system performance, including resource utilization, parallelism, and data I/O

  • Collaborate with Machine Learning Researchers: Work closely with researchers to understand and execute their data and infrastructure requirements, providing solutions that facilitate rapid experimentation and production-scale model deployment

  • Develop Evaluation Tools and Dashboards: Create and maintain evaluation metrics, tools, visualizations, and dashboards to support the development and refinement of neural networks

  • Implement Low-Level Integrations: Write efficient, low-level code that integrates with high-level training frameworks to enhance performance across various hardware platforms, including Dojo, Tesla’s supercomputer

  • Stay Updated with ML Advancements: Keep abreast of the latest advancements and technologies in machine learning engineering to continually improve Tesla’s AI infrastructure

What You’ll Bring
  • Strong Software Engineering Skills: Extensive experience with Python and software engineering best practices, including code optimization and system-level programming
  • Experience with Deep Learning Frameworks: Proficiency in one or more deep learning frameworks, such as PyTorch or TensorFlow, with hands-on experience in optimizing model training processes

  • Data Manipulation and Analysis Expertise: Proficiency with data manipulation tools, including Jupyter notebooks, numpy, scipy, matplotlib, and scikit-learn, and experience handling large-scale data processing

  • System Optimization and Debugging: Demonstrated experience in profiling and optimizing CPU/GPU code and debugging complex system-level software to ensure high performance and reliability

  • Distributed Systems Experience: Proven track record of building and managing large-scale distributed systems, particularly in AI/ML workflows, with a deep understanding of parallel computing, resource utilization, and data handling

  • Knowledge of Storage and Data Formats: Strong understanding of underlying storage mechanisms and experience designing and optimizing data formats for machine learning workflows

  • Familiarity with High-Performance Networking: Experience with high-performance networking technologies, such as Infiniband, RDMA, and NCCL, is a plus

  • Passion for AI and Machine Learning: A deep understanding of machine learning concepts and a passion for staying current with the latest advancements in AI research and engineering