Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Palo Alto DevOps Operations Engineering Senior Manager Cortex 
Israel, Tel Aviv District, Tel Aviv-Yafo 
415250971

18.02.2025

Being the cybersecurity partner of choice, protecting our digital way of life.

Your Career

You will play a critical role in ensuring the stability, scalability, and efficiency of our high-scale production systems. You will lead teams responsible for maintaining production reliability, managing on-call processes, and driving automation for incident remediation.

Your Impact

  • Ensure the ongoing resilience, stability, and availability of production environments.

  • Lead the design and implementation of on-call processes that reduce noise and improve incident response times.

  • Drive the development of remediation automation to prevent recurring issues and reduce manual intervention.

  • Collaborate with engineering teams to align reliability goals

  • Establish best practices for system upgrades, maintenance processes, and operational playbooks.

  • Foster a culture of continuous improvement through post-incident reviews and proactive problem-solving.

  • Empower your team to build self-service tools that enhance operational efficiency and reduce toil.

  • Promote and enhance observability practices to improve system monitoring, alerting, and diagnostics.

Your Experience

  • Strong management experience: 7+ years of managing SRE and operational groups, driving stability and efficiency improvements.

  • Proven hands on background in Devops domain: 5+ years of experience, including on-call shifts, system upgrades, incident management, and driving reliability improvements through automation.

  • Demonstrated ability to work cross-functionally in a matrix organizational structure.

  • Ability to communicate complex technical concepts to both technical and non-technical stakeholders, ensuring alignment across the organization.

  • Proven experience managing large-scale, complex systems and ensuring stability and performance at scale.

  • Strong analytical and troubleshooting skills, with a proactive approach to identifying and resolving issues before they impact production.

  • Strong foundation in cloud infrastructure (AWS, GCP, Azure - GCP is preferred), Kubernetes and monitoring tools (Prometheus, Grafana, etc.).

All your information will be kept confidential according to EEO guidelines.