Perform scaling law analyses on model size, data size, data mixture, training compute, and other critical parameters to optimize our AI models using the largest self-driving dataset in the world....
Description: What You’ll Do - Perform scaling law analyses on model size, data size, data mixture, training compute, and other critical parameters to optimize our AI models using the largest self-driving dataset in the world
- Develop and implement novel architectures and algorithms to effectively scale large End-to-End (E2E) self-driving models
- Create and maintain infrastructure for efficient, large-scale distributed training of E2E models, resolving compute and memory bottlenecks for training and inference
- Evaluate and enhance model performance, with a focus on increasing miles driven without human intervention
- Work closely with cross-functional teams to deploy AI models in production, ensuring they meet stringent performance and reliability standards
- Contribute to the development of tools and frameworks that improve the scalability and efficiency of model training and deployment processes
What You’ll Bring - Proven experience in scaling and optimizing large AI models, with a strong understanding of infrastructure challenges and solutions
- Proficiency in Python and a deep understanding of software engineering best practices
- In-depth knowledge of deep learning fundamentals, including optimization techniques, loss functions, and neural network architectures
- Experience with deep learning frameworks such as PyTorch, TensorFlow, or JAX
- Strong expertise in distributed computing and parallel processing techniques
- Demonstrated ability to work collaboratively in a cross-functional team environment
- Strong problem-solving skills and the ability to troubleshoot complex system-level issues