Role Rank:Senior
You will act as a senior escalation point for major infrastructure-related incidents, ensuring timely resolution and operational stability. You will provide technical leadership in issue triaging, perform advanced troubleshooting, and coordinate with Engineering/product teams for complex infrastructure challenges. This role requires deep expertise in AWS cloud infrastructure, automation, and incident response, with a focus on mentoring junior engineers and enhancing SOPs and runbooks for streamlined support operations
Your key responsibilities
- Lead incident response and coordination for AWS infrastructure issues, ensuring timely troubleshooting and resolution.
- Act as the primary escalation point for critical incidents that require in-depth analysis and coordination with engineering teams.
- Own and execute SOPs and runbooks to manage cloud infrastructure-related requests, issues, and remediation activities.
- Review and refine incident handling processes to enhance troubleshooting efficiency within the AHD team.
- Conduct log analysis and system diagnostics using various tools and ITSM tool’s work notes.
- Ensure proper access management & request fulfilment, including IAM role validation, security configurations, and VPC networking support is provided by the team
- Monitor and troubleshoot containerized environments and infrastructure components.
- Provide technical mentorship and training for junior engineers, improving incident handling and automation skills.
- Work closely with product teams to identify recurring issues, document knowledge base updates, and drive SOP/process standardization.
- Participate in shift handovers and governance meetings, ensuring knowledge transfer and clear communication of ongoing issues.
- Provide guidance to junior engineers in handling cloud infrastructure issues and best practices
Skills and attributes for success
- Strong technical leadership and escalation management skills.
- Deep expertise in AWS infrastructure operations, including EC2, IAM, VPC, and security groups.
- Hands-on experience with Kubernetes (EKS), Helm, and container orchestration.
- Strong log analysis and troubleshooting experience using AWS CloudWatch and OpenTelemetry (OTEL).
- Experience working with ITSM tools.
- Ability to analyse trends, identify recurring issues, and propose automation-driven solutions.
- Excellent communication and stakeholder coordination skills to work with product teams.
- Experience in refining SOPs, troubleshooting guides, and runbooks for operational efficiency.
To Qualify for the Role, You Must Have
- 7+ years of experience in cloud infrastructure operations, incident management, and technical support.
- Deep understanding of AWS security principles, IAM policies, and encryption mechanisms.
- Experience troubleshooting and managing Kubernetes (EKS), Helm, and containerized workloads.
- Experience working with ITSM tools.
- Strong problem-solving skills with experience in handling major incidents and leading root cause analysis (RCA).
- Willingness to work in a 24x7 rotational shift-based support environment.
- No location constraints; ability to collaborate with global teams.
Must haves
- Cloud Platforms: AWS
- ITSM tool: Any (with preference for ServiceNow tool).
- Infrastructure Operations: AWS Security Groups, VPC Peering, Load Balancers
- Containerization & Orchestration: Kubernetes (EKS), Helm, Docker
- Logging & Monitoring: AWS CloudWatch, OpenTelemetry
- Infrastructure as Code (IaC) & Automation – Executing Terraform or Ansible-based automation scripts
- Certification: Any AWS certification
Good to have
- Networking: Advanced troubleshooting in AWS networking and security best practices
- Security & Compliance: AWS Security Hub, IAM Policies, and Cloud Security Posture
- Infrastructure Compliance Understanding: AWS Well-Architected Framework principles
What we look for
- Enthusiastic learners with a passion for cloud technologies and practices.
- Problem solvers with a proactive approach to troubleshooting and optimization.
- Team players who can collaborate effectively in a remote or hybrid work environment.
- Detail-oriented professionals with strong documentation skills.
What we offer
EY Global Delivery Services (GDS) is a dynamic and truly global delivery network. We work across six locations – Argentina, China, India, the Philippines, Poland and the UK – and with teams from all EY service lines, geographies and sectors, playing a vital role in the delivery of the EY growth strategy. From accountants to coders to advisory consultants, we offer a wide variety of fulfilling career opportunities that span all business disciplines. In GDS, you will collaborate with EY teams on exciting projects and work with well-known brands from across the globe. We’ll introduce you to an ever-expanding ecosystem of people, learning, skills and insights that will stay with you throughout your career.
- Continuous learning: You’ll develop the mindset and skills to navigate whatever comes next.
- Success as defined by you : We’ll provide the tools and flexibility, so you can make a meaningful impact, your way.
- Transformative leadership : We’ll give you the insights, coaching and confidence to be the leader the world needs.
- Diverse and inclusive culture: You’ll be embraced for who you are and empowered to use your voice to help others find theirs.
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.
If you can demonstrate that you meet the criteria above, please contact us as soon as possible.
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.