Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

JPMorgan Lead Site Reliability Engineer Cloud Technology 
Singapore 
723854064

15.07.2025

Public Cloud SRE is responsible for engineering and operating the cloud infrastructure and platforms of JPMC ensuring reliability, resiliency, and security. We have a Senior Software Engineer, Site Reliability position to build the infrastructure and tooling for JPMC’s Public Cloud Platform.

Job responsibilities

  • Engage in and improve the lifecycle of cloud services from inception, design, deployment, and operation
  • Automate repeated manual tasks, develop tools and automation to improve the efficiency of the platform and infrastructure.
  • Analyze defects, propose improvements and drive efficiencies in systems and processes.
  • Helps to develop new cloud engineering strategies and implementations for the firm
  • As part of Site Reliability, you have the responsibility of ensuring the reliability, availability, and performance of the cloud infrastructure and platform.
  • Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team
  • Develop observability and telemetry tools.
  • Author and improve the quality of technical engineering documentation
  • Debug and solve issues in a production environment
  • Participates in SRE on-call rotations and escalation workflows.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering or site reliability engineering and 5+ years applied experience
  • Bachelor’s Degree in Computer Science or equivalent
  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
  • Expertise in building solutions with AWS cloud service, knowledge in Infrastructure as Code, tools such as Terraform and fluency in at least one programming language such as Python and Java
  • Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
  • Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
  • Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.) and troubleshooting common networking technologies and issues
  • Ability to identify and solve problems related to complex data structures and algorithms
  • Drive to self-educate and evaluate new technology and ability to teach team members
  • Ability to expand and collaborate across different levels and stakeholder groups. Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency and drive

Preferred qualifications, capabilities, and skills

  • AWS certifications will be a bonus.