Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Uber Sr Cloud Reliability Engineer Platform Engineering 
United States, West Virginia 
893144337

Yesterday

About the Role

- - - - What the Candidate Will Do ----

  • Incident Management & Response: Lead cloud incident management efforts, ensuring rapid detection, triage, and resolution across all cloud platforms.
  • Root Cause Analysis & SLA Compliance: Evolve key process to ensure cloud incident RCAs are completed within the agreed Service Level Agreements, track all action items, and drive continuous improvement in cloud reliability.
  • Monitoring & Automation: Unify automated monitoring, alerting mechanisms, and centralized incident logging to improve detection and response times.
  • Reporting & Insights: Develop targeted reporting to provide directly relevant cloud reliability insights.
  • Continuous Improvement: Identify patterns in incidents, optimize response playbooks, and enhance incident management frameworks for ongoing operational resilience.

- - - - Basic Qualifications ----

  • 5+ years of experience in cloud incident management, SRE, or operations.
  • Expertise in a multi-cloud environments
  • Experience with incident detection, response, and RCA processes
  • Strong analytical and problem-solving skills, with the ability to work under pressure.
  • Excellent communication and stakeholder management skills.

- - - - Preferred Qualifications ----

  • Certifications in cloud platforms
  • Hands-on experience with incident escalation procedures and service recovery plans.
  • Experience with automated logging and forensic analysis tools
  • Familiarity with SLAs, compliance, and audit processes
  • Prior experience working in a highly scalable global organization

* Accommodations may be available based on religious and/or medical conditions, or as required by applicable law. To request an accommodation, please reach out to .