Job responsibilities
- Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
- Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
- Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
- Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
- Demonstrated strong analytical skills to diagnose and resolve complex technical issues.
- Ability to perform root cause analysis and implement preventive measures.
- Experience in managing incidents and coordinating response efforts
- Has the ability to drive initiatives for process and system improvements.
- Supports the adoption of site reliability engineering best practices within your team
- Should complete SRE Bar Raiser Program
Required qualifications, capabilities, and skills
- Formal training or certification as Site Reliability Engineer in an enterprise infrastructure environment and 3+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
- Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
- Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
- Familiarity with CI/CD pipelines and tools like Jenkins, GitLab CI, or CircleCI.
- Proficiency in scripting languages like Python.
- Experience with cloud platforms like AWS, Google Cloud, or Azure
- Understanding of infrastructure as code (IaC) using tools like Terraform or Ansible.
Preferred qualifications, capabilities, and skills
- Strong communication skills to collaborate with cross-functional teams.
- Skills in planning for future growth and scalability of systems
- Experience with Data Protection solutions such as Cohesity or Commvault