Finding the best job has never been easier
Share
Develop and maintain a reliable monitoring and alerting system to detect and mitigate issues proactively.
Drive incident management processes and conduct post-mortem analyses to prevent future outages.
Stay ahead of industry trends and emerging technologies to continuously improve system reliability and performance.
Bachelor’s degree in computer science, Engineering, or a related field; Master’s preferred.
Minimum of 7 years of experience in SRE, DevOps, or similar roles, with at least 3 years in a leadership position.
Proficiency in programming languages such as Python, Go, or Java.
Extensive experience with cloud services (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
Solid understanding of CI/CD pipelines and automation tools (Jenkins, Ansible, Terraform).
Exceptional knowledge of observability tools and setting up architecture for proactive monitoring of the product.
Proven track record of designing and implementing scalable, high-availability systems.
Exceptional problem-solving skills and the ability to work under pressure.
These jobs might be a good fit