Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Citi Group Head Production Management Resiliency - Director 
United Kingdom, England, London 
672026809

29.05.2025

Implement Enhanced Testing and Recovery:

  • Oversee the implementation and execution of Production Swing testing for critical applications, ensuring applications run from their alternate site for a minimum of 5 days.

  • Implement and oversee Data Recovery testing, ensuring applications can recover critical data from backup solutions within the defined Impact Tolerance (ITOL).

  • Drive the onboarding of critical applications to the One-Touch Recovery orchestration solution.

  • Minimize the Recovery Time Actual (TRTA) for critical applications.

Design and Architecture:

  • Champion resilient application design by advocating for and integrating resiliency principles into architectures, and promoting the use of established resiliency patterns.

  • Leverage cloud-native services and features to enhance application resiliency. This includes services for auto-scaling, load balancing, and disaster recovery.

  • Explore and implement chaos engineering practices to proactively identify and address system weaknesses under stress.

Proactive Vulnerability Management:

  • Proactively identify vulnerabilities through regular architecture reviews, comprehensive scenario testing, and foundational testing.

  • Document and demonstrate mitigation efforts for all discovered vulnerabilities. This includes developing remediation plans, implementing necessary changes, and validating the effectiveness of mitigations.

  • Ensure that all identified vulnerabilities have remediation plans scheduled.

Operational Resilience Adherence:

  • Ensure that all critical applications adhere to operational resilience testing and recovery requirements.

  • Collaborate with relevant stakeholders to define and maintain appropriate impact tolerances for critical business services.

Performance Measurement and Reporting:

  • Monitor and report on key resilience metrics, including the number of applications executing production swing tests, the number of applications on One-ouch Recovery, recovery times and adherence to operational resilience requirements.

  • Provide regular updates to senior management on the status of resilience initiatives and key performance indicators.

Key Qualifications:

  • Relevant professional software engineering experience - and in particular in SRE roles

  • Expertise analyzing complex application, database, network, and OS issues across a distributed large scale customer facing systems

  • Strong communication skills and ability to work effectively across multiple business and technical team

  • Experience in Java, .NET, Maven, Gradle, Jenkins, Helm, Puppet, Chef, Ansible, Kubernetes, AWS, Splunk, Prometheus

  • BS degree in computer science or equivalent field

What we’ll provide you:

By joining Citi, you will not only be part of a business casual workplace with a hybrid working model (up to 2 days working at home per week), but also receive a competitive base salary (which is annually reviewed), and enjoy a whole host of additional benefits such as:

  • 27 days annual leave (plus bank holidays)

  • A discretional annual performance related bonus

  • Private Medical Care & Life Insurance

  • Employee Assistance Program

  • Pension Plan

  • Paid Parental Leave

  • Special discounts for employees, family, and friends

  • Access to an array of learning and development resources

Technology Product Management


Time Type:

View Citi’sand the