Required qualifications, capabilities, and skills
- Formal training or certification on SRE concepts and 5+ years applied experience
- Expertise in SRE principles, reliability, scalability, and performance of application and infrastructure.
- Expertise in programming with Python and Infrastructure as Code, tools such as Terraform.
- Experience in architecting distributed systems and cloud-native architecture in AWS.
- Systematic problem-solving and troubleshooting skills in a complex system.
- Excellent communication skills and ability to represent and present business and technical concepts to stakeholders.
Preferred qualifications, capabilities, and skills
- Preferred experience working in AI, ML, or Data engineering.
- Familiar in container orchestration/Kubernetes.
- Familiar in developing Automation frameworks/AI Ops.