Site Reliability Engineer Ai Infrastructure jobs at Tesla in United States, Palo Alto
Discover your perfect match with Expoint. Search for job opportunities as a Site Reliability Engineer Ai Infrastructure in United States, Palo Alto and join the network of leading companies in the high tech industry, like Tesla. Sign up now and find your dream job with Expoint
Company (1)
Job type
Job categories
Job title (1)
United States
State
Palo Alto
192 jobs found
12.08.2024
T
Tesla Site Reliability Engineer AI Infrastructure United States, California, Palo Alto
Support the AI/ML cluster infrastructure on both GPU and Dojo platforms, focusing on systems automation, configuration management and deployment at scale. Improve our monitoring & self-healing pipelines, as well as...
Support the AI/ML cluster infrastructure on both GPU and Dojo platforms, focusing on systems automation, configuration management and deployment at scale. Improve our monitoring & self-healing pipelines, as well as...
Own and develop reliable middleware to communicate with PXI test hardware and read back data. Build robust and flexible Python tools to automate test equipment that communicates over CAN, LIN,...
Reduce wall clock time to convergence of our training jobs by identifying bottlenecks in the ML stack, from data-loading up to the GPU. Integrate efficient, low-level code with the overall...
Write robust Python software code in our machine learning training repository while applying best software practices to support machine learning scientists in tasks such as fetching training data, preprocessing it,...
Take ownership of tooling software for the compiler andhardware monitoring. Develop algorithms to improve sensitivity and performanceof the analysis tools. Debug functional issues on massively parallel systems,including compiler bugs, defective...
Apply solid knowledge of reliability methods and power electronics / high voltage systems to design accelerated test plans for design validation, burn-in testing, environmental stress screening (ESS), ongoing reliability testing...
Support the AI/ML cluster infrastructure on both GPU and Dojo platforms, focusing on systems automation, configuration management and deployment at scale. Improve our monitoring & self-healing pipelines, as well as...
Find your dream job in the high tech industry with Expoint. With our platform you can easily search for Site Reliability Engineer Ai Infrastructure opportunities at Tesla in United States, Palo Alto. Whether you're seeking a new challenge or looking to work with a specific organization in a specific role, Expoint makes it easy to find your perfect job match. Connect with top companies in your desired area and advance your career in the high tech field. Sign up today and take the next step in your career journey with Expoint.