Engage in and improve the lifecycle of cloud services from inception, design, deployment, and operation
Automate repeated manual tasks, develop tools and automation to improve the efficiency of the platform and infrastructure.
Analyze defects, propose improvements and drive efficiencies in systems and processes.
As part of Site Reliability, you have the responsibility of ensuring the reliability, availability, and performance of the cloud infrastructure and platform.
Demonstrates Site Reliability principles and practices every day and champions the adoption of site reliability throughout your team
Develop observability and telemetry tools.
Author and improve the quality of technical engineering documentation
Debug and solve issues in a production environment
Participates in SRE on-call rotations and escalation workflows.
Preferred qualifications, capabilities, and skills
Bachelor's degree in Computer Science, Information Technology, or equivalent technical qualification or professional experience.
Applied knowledge in site reliability culture and principles.
Expertise in building solutions with AWS cloud services.
Proficiency in programming with Python.
Knowledge in Infrastructure as Code, tools such as Terraform
Systematic problem-solving and troubleshooting skills.
Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems
Self-disciplined, self-managed, self-motivated, and strong sense of ownership, urgency, and drive