Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

CheckPoint Site Reliability Team Leader 
Israel, Tel Aviv District, Tel Aviv-Yafo 
970960959

25.03.2025

We use an interesting and mixed technology stack:Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.

In this position, you will use your expertise in, and will.


Key Responsibilities
  • Design, build, and manage our SRE framework to ensure observability, resilience, and high availability.
  • Develop and automate solutions for proactive monitoring, incident response, and performance optimization.
  • Improve and maintain our alerting and monitoring stack, leveraging tools like Datadog, Prometheus, and Grafana.
  • Lead post-mortem analysis and implement continuous improvement initiatives.
  • Collaborate with DevOps, Engineering, and Product teams to ensure smooth and efficient delivery of reliable services.
Qualifications
  • SRE & Production Manager with 5+ years of experience in SRE, Production Engineering, or DevOps, including 2+ years in a leadership role.
  • Experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
  • A problem solver, capable of finding creative solutions and getting things done.
  • Fluent with incident management, RCA processes, and operational best practices.
It would be great if you also have:
  • Experience in high-scale distributed systems.
  • Background in security and compliance for cloud infrastructure.
  • Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations).
  • Understanding of cost optimization and resource management in cloud environments.
  • Familiarity with machine learning or predictive analytics for proactive reliability management.
  • Proficiency in Python, Go, or Bash for automation and scripting.