Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Citi Group Head Production Management Resiliency - Director New York Hybrid 
United States, New York, New York 
261708214

Yesterday

Implement Enhanced Testing and Recovery:

  • Oversee the implementation and execution of Production Swing testing for critical applications, ensuring applications run from their alternate site for a minimum of 5 days.
  • Implement and oversee Data Recovery testing, ensuring applications can recover critical data from backup solutions within the defined Impact Tolerance (ITOL).
  • Drive the onboarding of critical applications to the One-Touch Recovery orchestration solution.
  • Minimize the Recovery Time Actual (TRTA) for critical applications.

Design and Architecture:

  • Champion resilient application design by advocating for and integrating resiliency principles into architectures, and promoting the use of established resiliency patterns.
  • Leverage cloud-native services and features to enhance application resiliency. This includes services for auto-scaling, load balancing, and disaster recovery.
  • Explore and implement chaos engineering practices to proactively identify and address system weaknesses under stress.

Proactive Vulnerability Management:

  • Proactively identify vulnerabilities through regular architecture reviews, comprehensive scenario testing, and foundational testing.
  • Document and demonstrate mitigation efforts for all discovered vulnerabilities. This includes developing remediation plans, implementing necessary changes, and validating the effectiveness of mitigations.
  • Ensure that all identified vulnerabilities have remediation plans scheduled.

Operational Resilience Adherence:

  • Ensure that all critical applications adhere to operational resilience testing and recovery requirements.
  • Collaborate with relevant stakeholders to define and maintain appropriate impact tolerances for critical business services.

Performance Measurement and Reporting:

  • Monitor and report on key resilience metrics, including the number of applications executing production swing tests, the number of applications on One-Touch Recovery, recovery times and adherence to operational resilience requirements.
  • Provide regular updates to senior management on the status of resilience initiatives and key performance indicators.

Key Qualifications:

  • 7+ years of professional software engineering experience
  • 4+ years of experience in SRE roles
  • Expertise analyzing complex application, database, network, and OS issues across a distributed large scale customer facing systems
  • Strong communication skills and ability to work effectively across multiple business and technical team
  • Experience in Java, .NET, Maven, Gradle, Jenkins, Helm, Puppet, Chef, Ansible, Kubernetes, AWS, Splunk, Prometheus
  • BS degree in computer science or equivalent field
Applications Support

Full timeNew York New York United States$170,000.00 - $300,000.00



Anticipated Posting Close Date:

Jul 01, 2025

View Citi’s and the poster.