Finding the best job has never been easier
Share
• Lead and manage the Site Reliability Engineering (SRE) team to ensure high availability, reliability, and performance of systems on the Azure platform.
• Design and implement strategies for monitoring, alerting, and incident response.
• Collaborate with development and operations teams to establish SRE best practices and optimize system reliability.
• Automate infrastructure provisioning, configuration, and deployment using Infrastructure as Code (IaC).
• Drive root cause analysis and implement solutions for production issues.
• Develop and maintain SLOs, SLIs, and SLAs to improve service reliability and performance; implement and oversee CI/CD pipelines for streamlined software delivery.
Required Technical and Professional Expertise
Preferred Technical and Professional Expertise
• Experience withmicroservices architectureand distributed systems;
– knowledge ofsecurity best practicesin cloud environments.
– familiarity withAzure Policyand governance practices.
– expertise in database performance tuning (e.g., Azure SQL, Cosmos DB).
• Leadership experience in managing cross-functional teams.
Certifications such as,, or
CKA (Certified Kubernetes Administrator).
Exposure to AI/ML workloads on Azure; experience with hybrid or multi-cloud environments; knowledge of serverless architecture (e.g., Azure Functions).
These jobs might be a good fit