Job responsibilities
- rk with other SREs in the SRE Community, Dev Team and infrastructure and Cloud support organization to ensure that Platform service reliability, availability and performance meet our customer needs
- Plan and execute project that helps and improve reliability or efficiency of the system
- Work closely with SWE and other partners to continuously evolve and expand the Platform capabilities
- Own availability, performance, and supportability targets of the service
- Identify opportunity and drive the design and implementation of end to end telemetry, alerting, self-healing and automation capabilities to improve the Platform availability, manageability and reliability
Required qualifications, capabilities, and skills
- Formal training or certification on site reliability engineering concepts and 3+ years applied experience.
- Hands on experience cloud native application development and operations
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Expertise in troubleshooting common networking issues , public cloud issues , Linux OS.
- Proficient in at least one programming language such as Go, Python, Java/Spring Boot, .
- Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Public and private Cloud, artificial intelligence, Mobile app development, etc.)
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, and Infra as code Terraform Ansible etc
- Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Preferred qualifications, capabilities, and skills
- Familiarity with modern Cloud native technologies
- Familiarity with infrastructure as code tools
- Exposure to Kubernetes, networking ,Linux and Scripting