Job responsibilities
- Manage the Product Infrastructure Engineering/SRE function for business critical, highly regulated streaming products in hybrid multi-cloud firmwide platforms
- Influence software and infrastructure engineering teams to reduce toil and improve operational efficiencies
- Additionally manages India messaging, integration and streaming team members’ development by ensuring they have access to resources needed for learning
- Applies a wide range of tactics and strategies to guide internal executive decisions to achieve substantial goals
- Collaborates with multiple stakeholders and program managers for complex projects and takes accountability for Streaming products
- Collaborates with support teams (TSE/SLA/SLO stuff) and other infrastructure engineering teams
- Manage for incident and problem management of Streaming products
- Provides data insights for the monthly business reviews
- Owns end to end accountability for audit, risks and controls for the streaming products
- Champions the firm’s culture of diversity, equity, inclusion, respect for team members and prioritizes diverse representation
Required qualifications, capabilities, and skills
- Formal training or certification on data management concepts and 10+ years applied experience. In addition, 5+ years of experience leading technologists to manage, anticipate and solve complex technical items within your domain of expertise with focus on Java and Python.
- Experience in building large scale distributed managed services or large scale infrastructure platforms like Kafka/Streaming Platform
- Deep product knowledge of Apache Kafka and its ecosystem of products like Kafka connect, Schema registry, MSK, etc.
- Experience in building and manage self-service / REST APIs and scale them on demand
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
- Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
- Demonstrated prior experience influencing across highly matrixed, complex organizations and delivering value at scale
- Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments for highly critical and regulated products
- Demonstrated prior experience developing SLO/SLIs, maturity uplifts, observability strategies and TOIL reduction at an enterprise scale
- Demonstrated prior experience managing high severity production incidents to resolution
- Demonstrated experience managing Audit/ Controls for multiple technologies
Preferred qualifications, capabilities, and skills
- Experience hiring, developing, and recognizing talent
- Preferable experience of SRE for SaaS applications
- Knowledge in messaging products like IBM MQ, RabbitMQ, SQS, etc.