Job responsibilities
- Manages team members’ development by ensuring they have access to resources needed for learning
- Collaborates across the firm to align team members for mobility opportunities in line with their career aspirations
- Applies a wide range of tactics and strategies to guide internal executive decisions to achieve substantial goals
- Manages multiple stakeholders and complex projects consisting of large teams
- Implements innovative methods, techniques, and evaluation criteria for projects and people working on highly complex business issues
Required qualifications, capabilities, and skills
- Formal training or certification on site reliability/software engineering concepts and 10+ years applied experience. In addition, 5+ years of experience leading technologists to manage, anticipate and solve complex technical items within your domain of expertise
- Influences the teams' culture by championing innovation and change for firmwide success
- Expertise in monitoring tools (e.g., Prometheus, Grafana, Nagios) and logging systems (e.g., ELK stack, Splunk).
- Ability to implement and manage observability practices to ensure system reliability.
- Proficiency in cloud platforms (e.g., AWS, Azure, Google Cloud) and their services.
- Experience in implementing SRE principles and practices to improve system reliability and availability.
- Proficiency in SQL, NoSQL databases, and data warehousing solutions
- Experience hiring, developing, and recognizing talent
- Demonstrated prior experience influencing across highly matrixed, complex organizations and delivering value at scale
- Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments
Preferred qualifications, capabilities, and skills
- Knowledge of data governance frameworks and best practices.
- Familiarity with data privacy regulations (e.g., GDPR, CCPA)
- Skills in identifying and resolving performance bottlenecks.
- Experience with load testing and capacity planning.