Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Apple Manager Incident Response Service Reliability 
United States, New York, New York 
416818028

Yesterday
As the manager for the Incident Response and Service Reliability Team, you will lead the team responsible for Apple Wallet’s real-time incident response program. You will define and operate the processes for detecting, triaging, prioritizing, and mitigating service-impacting incidents. You will drive the proactive identification of recurring issues, lead root cause analysis, and partner with engineering to implement long-term fixes that reduce risk and improve reliability. Through close collaboration with engineering, infrastructure, SRE, and product teams, you will ensure that incidents are handled with urgency, communication is clear, and issues are addressed at the root.
  • Define and own the strategic vision for incident and problem management, integrating tooling, response structure, and continuous improvement across engineering.
  • Lead the end-to-end incident response program, including severity classification, escalation protocols, stakeholder communication, and real-time coordination.
  • Own the problem management function by identifying systemic issues, driving root cause analysis, and partnering with engineering to implement long-term fixes.
  • Manage a team of incident and problem managers, setting priorities, execution standards, and development goals.
  • Define and track operational health metrics (e.g., MTTD, MTTM, MTTR), and drive improvements in detection, mitigation, and recovery timelines.
  • Oversee the adoption and evolution of incident tooling- e.g. monitoring, alerting, automation, documentation, and reporting.
  • Facilitate blameless post-incident reviews (PIRs) that result in clear accountability, cross-functional alignment, and durable outcomes.
  • Instill a culture of operational learning and resilience, drive systemic and architectural improvements to reduce incident volume, minimize customer impact, and increase operational resilience.
  • Bachelor’s degree or equivalent practical experience.
  • 8+ years of experience in incident management, technical program management, or SRE/infra leadership roles.
  • Demonstrated experience building or scaling an incident management program in a production or customer-facing environment.
  • Proven ability to define, measure, and influence operational metrics (MTTD, MTTR, etc.).
  • Strong cross-functional collaboration skills, particularly with engineering, product, and executive stakeholders.
  • Excellent communication skills under pressure, with the ability to drive clarity and urgency.
  • Experience with incident tooling (e.g., PagerDuty, Opsgenie, Slack bots, observability platforms).
  • Experience working in payments, banking, or other financial services companies in a developer role (SRE, DevOps or other engineering experience).
  • Experience leading incident programs across global teams or regulated environments.
  • Background in high-availability systems, payments infrastructure, or customer-critical services.
  • Familiarity with root cause analysis frameworks, postmortem facilitation, and chaos testing.
  • Experience integrating incident workflows with observability and BI platforms (e.g., Datadog, Grafana, Tableau).
  • Experience driving change in cross-functional or matrixed organizations.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.