Lead the execution and evolution of the Problem Management practice across Cloud Operations
Identify recurring and pervasive issues through data analysis and pattern recognition
Facilitate and contribute to blameless postmortems and incident reviews
Deliver actionable insights and recommendations to engineering and product teams
Track and drive resolution of root causes, ensuring long-term fixes are implemented
Build strong relationships with Engineering, SRE, and Support teams to foster a culture of shared responsibility
Host recurring uptime and reliability review meetings with cross-functional stakeholders
Communicate findings and progress clearly and concisely to executive leadership
Serve as a trusted advisor during incident reviews and RCA development
Develop and maintain KPIs, dashboards, and metrics to measure the effectiveness of Problem Management
Contribute to the design and implementation of continuous service improvement (CSI) initiatives
Ensure compliance with internal standards and regulatory require
Mentor teams on best practices in problem identification, analysis, and resolution.
Minimum Requirements:
5 years experience in ITSM, SRE or DevOps environments
Excellent communication skills, including executive-level reporting
Tenacious follow-through and attention to detail
Strong analytical and pattern recognition skills
Proven ability to lead cross-functional initiatives and influence without authority
Fast learner with a growth mindset and a passion for service reliability
3 years experience in these or similar technologies: ServiceNow; Jira/Confluence, Excel
Successful completion of a background screening process including, but not limited to, employment verifications, criminal search, OFAC, SS Verification, as well as credit and drug screening, where applicable and in accordance with federal and local regulations
Preferred Requirements:
Experience working in large-scale cloud environments