המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Palo Alto Manager Site Reliability Engineering Technical Incidents - Cortex
India, Karnataka, Bengaluru
360320747

26.08.2025

שיתוף

Being the cybersecurity partner of choice, protecting our digital way of life.

Your Career

We’re seeking an experienced Cloud SRE lead to lead high-severity incident and problem management across our GCP-centric platforms. This role combines deep technical troubleshooting with process ownership, ensuring rapid recovery, root cause elimination, and long-term reliability improvements. You will own L3 OnCall responsibilities, drive post-incident learning, and champion automation and operational excellence.

Implement and lead post-mortem processes within SLAs, identify root causes, and drive corrective actions to reduce repeat incidents.

Your Impact :

In your technical and leadership capacity you will contribute to a seamless production site reliability operations , partnering closely with regional and global SRE counterparts with special attention to the below
Incident Analysis & Problem Management: Implement and lead post-mortem processes within SLAs, identify root causes, and drive corrective actions to reduce repeat incidents. Establish and maintain a problem backlog, ensuring timely resolution and continuous process improvement.
Troubleshooting: Rapidly diagnose and resolve failures across Kubernetes, Terraform, and GCP using advanced troubleshooting frameworks.
Preventative Measures: Implement automation and enhanced monitoring to proactively detect issues and reduce incident frequency.
Stakeholder Communication: Work with GCP / AWS TAMs and othre vendors to request new features or followups for updates.
Mentorship: Coach and elevate SRE and DevOps teams, promoting best practices in reliability and incident/problem management.
Documentation: Establish and maintain a problem backlog, ensuring timely resolution and continuous process improvement.
Envision the future or SRE with AI/ML : Ability to envision how a modern SRE team should operate leveraging AI/ML

Your Experience

12+ years of experience in SRE/DevOps/Infrastructure roles, with a strong foundation in cloud-based environments.
5+ years of proven experience managing SRE/DevOps teams, preferably with a strong focus on Google Cloud Platform (GCP).
Deep hands-on knowledge of Terraform, Kubernetes (GKE), GitLab CI/CD, and modern observability practices (e.g., Prometheus, OpenTelemetry).
Strong experience in managing incident response and postmortems, reducing MTTR, and driving proactive reliability improvements.
Proficiency with cloud platforms such as GCP & AWS.
Solid grasp of Infrastructure as Code, container orchestration, and scalable cloud architectures.
Track record of building tools for system reliability, automated remediation, and performance tuning.
Experience leveraging AI/ML-based operations tools for automation, anomaly detection, and predictive alerting is a plus.
Expertise in SLI/SLO/SLA design and implementation, and driving operational maturity through data.
Strong interpersonal and leadership skills, with a demonstrated ability to coach, mentor, and inspire teams.
Effective communicator, capable of translating complex technical concepts to non-technical stakeholders.
Committed to inclusion, collaboration, and creating a culture where every voice is heard and respected.

All your information will be kept confidential according to EEO guidelines.

פרטי המשרה המלאים

משרות נוספות שיכולות לעניין אותך

Palo Alto Manager Site Reliability Engineering Cortex XDR XSIAM India, Karnataka, Bengaluru

JPM

JPMorgan Manager Site Reliability Engineering India, Karnataka, Bengaluru

Apple Site Reliability Engineering Manager India, Telangana, Hyderabad

Google Senior Site Reliability Manager Engineering India, Karnataka, Bengaluru

כלי לבניית קורות חיים מקצועיים מבית אקספוינט

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח

Palo Alto Manager Site Reliability Engineering Technical Incidents - Cortex India, Karnataka, Bengaluru 360320747

Palo Alto Manager Site Reliability Engineering Cortex XDR XSIAM India, Karnataka, Bengaluru

JPMorgan Manager Site Reliability Engineering India, Karnataka, Bengaluru

Apple Site Reliability Engineering Manager India, Telangana, Hyderabad

Google Senior Site Reliability Manager Engineering India, Karnataka, Bengaluru

Palo Alto Manager Site Reliability Engineering Technical Incidents - Cortex
India, Karnataka, Bengaluru
360320747