Design, implement, and maintain scalable, highly available, and fault-tolerant systems in cloud environments
Optimize the performance, cost, and efficiency of cloud infrastructure by leveraging cloud-native tools and services.
Supervise infrastructure and applications to ensure optimal performance, availability, and security.
Troubleshoot production issues in infrastructure platforms, applications, and services, including root cause analysis and resolution.
Implement automated monitoring and alerting to identify performance bottlenecks and downtime before it impacts users.
Collaborate with Devx application teams to automate and streamline the deployment of applications and updates to production environments using CI/CD pipelines.
Ensure smooth and efficient release management, including managing environment configurations and ensuring minimal downtime during production releases.
Maintain version control and manage rollback strategies for production releases.
Participate in on-call rotations to provide 24/7 production support for critical incidents in the cloud platform.
Lead incident management processes, including troubleshooting, escalation, and resolution of production issues.
Document incidents and solutions for future reference and continuous improvement.
Minimum Qualification:
BS/MS in Computer Science
At least 12+ years of experience includes years of experience in production engineering, site reliability, or a similar role.
In-depth experience with Container platforms such as Google Anthos.
Strong understanding of networking, containers (e.g., Docker, Kubernetes), microservices architectures and distributed systems. Proficient in CI/CD tools (e.g., Jenkins, ArgoCD) and version control systems (e.g., Github).
Strong understanding of CI/CD pipelines, observability (monitoring, logging, tracing), and incident management frameworks.
Excellent problem-solving skills, with the ability to diagnose and resolve complex production issues.
Strong communication and leadership skills, with a track record of driving technical initiatives.