Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Tesla Sr. Site Reliability Engineer Bare Metal Infrastructure 
United States, Texas, Austin 
646049882

16.04.2025
What to Expect

Tesla cloud as a service seeks a high impact Site Reliability Engineer (SRE) to support our bare-metal provisioning platform at scale. You’ll provide direct support to internal customers, resolve complex provisioning issues, and escalate systemic problems to engineering. Your focus: ensuring reliable, automated delivery of bare-metal infrastructure using Kubernetes, Metal³, and industry standard tooling across diverse hardware from Supermicro, HPE, and Dell.

What You’ll Do
  • Provide frontline support for Tesla Cloud, Metal as a Service customers provisioning bare-metal servers
  • Troubleshoot and resolve hardware, firmware, network, and provisioning failures (PXE, DHCP, VLAN, BMC)
  • Automate image builds (Packer, QCOW2), server configurations (Ansible), and deployment workflows
  • Manage and maintain large-scale Kubernetes and Metal³-powered provisioning pipelines
  • Interface with BMCs via Redfish for remote management, firmware updates, and recovery actions
  • Propagate recurring issues and feature requests to engineering teams for roadmap improvements
  • Participate in 24/7 on-call rotation ensuring high availability of the MaaS platform
  • Own observability: implement monitoring, alerting, and logging for critical systems
What You’ll Bring
  • Advanced proficiency in Golang and Python for automation and tooling
  • Deep Linux expertise (Ubuntu 22.04/24.04) with strong system internals knowledge
  • Proven experience with bare-metal provisioning at scale using Kubernetes and Metal³
  • In-depth knowledge of PXE booting, DHCP, TFTP, and VLAN tagging
  • Strong understanding of BMC firmware management and Redfish API operations
  • Skilled in infrastructure-as-code (Ansible), CI/CD workflows (GitHub Actions, Jenkins), and artifact management (Artifactory)
  • Experience supporting Supermicro, HPE, and Dell hardware in production environments
  • Ability to debug complex, cross-layer issues involving hardware, network, and software
  • Habitual documenter and knowledge sharer; committed to operational excellence
  • Bachelor’s Degree in Computer Science, Engineering, or equivalent practical experience