Essential Responsibilities:
- Manage and lead a team of site reliability engineers, setting goals, providing mentorship, and ensuring the team delivers high-quality reliability solutions.
- Oversee the development and execution of strategies to ensure the availability, reliability, and performance of critical applications and platforms.
- Lead incident management efforts across teams, ensuring the timely resolution of high-impact incidents and ensuring effective post-incident reviews.
- Define and prioritize the site reliability engineering roadmap, aligning with business objectives, and ensuring that key reliability goals are met.
- Direct initiatives to improve system scalability, fault tolerance, and resilience to handle high volumes of traffic and failure scenarios.
- Oversee capacity planning efforts, ensuring system resources are proactively managed to meet current and future business needs.
- Lead, mentor, and develop the site reliability engineering team by providing career growth opportunities, technical guidance, and fostering a culture of continuous improvement.
- Champion automation across operations and reliability processes, focusing on improving efficiency and reducing manual intervention.
- Work closely with engineering, product, and operations leadership to ensure site reliability strategies are aligned with overall business and technical goals.
- Define key performance indicators (KPIs) and reliability metrics, and provide regular reporting on team and system performance to executive leadership.
Minimum Qualifications:
- Minimum of 8 years of relevant work experience and a Bachelor's degree or equivalent experience.
Our Benefits:
Any general requests for consideration of your skills, please