Job responsibilities
- Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
 - Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
 - Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
 - Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
 - Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
 - Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
 - Supports the adoption of site reliability engineering best practices within your team
 
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
 - Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
 - Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, OpenTelemetry).
 - Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
 - Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
 - Willingness to participate in on-call rotation and respond to production incidents.
 
Preferred qualifications, capabilities, and skills
- Familiar in banking, fintech, or regulated environments.
 - Participation in game days or chaos engineering.
 - Interest in sharing knowledge and best practices with peers.