Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

JPMorgan Site Reliability Engineer III 
United States, Texas, Houston 
668738333

01.07.2025

Job responsibilities

  • Write high-quality , maintainable, and well-tested software to develop reliable and repeatable solutions to complex problems.
  • Collaborate with product development teams to design, implement and manage CI/CD pipelines to support reliable, scalable, and efficient software delivery.
  • Partner with product development teams to capture and define meaningful service level indicators (SLIs) and service level objectives (SLOs).
  • Develop and maintain monitoring, alerting, and tracing systems that provide comprehensive visibility into system health and performance.
  • Contribute to design reviews to evaluate and strengthen architectural resilience, fault tolerance and scalability.
  • Uphold incident response management best practices, champion blameless postmortems and continuous improvements.
  • Debug, track, and resolve complex technical issues to maintain system integrity and performance.
  • Champion and drive the adoption of reliability and resiliency best practices.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and 3+ years applied experience
  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Experience analyzing, troubleshooting and supporting large-scale systems.
  • Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
  • Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
  • Practical experience building production-grade software in at least one programming language such as Java, Python, or Go.
  • Solid understanding of the fundamentals of distributed systems, and reliability patterns for achieving redundancy, fault tolerance, and graceful degradation.
  • Solid understanding of networking concepts, including TCP/IP, routing, firewalls, and DNS.
  • In-depth knowledge of Unix/Linux, including performance tuning, process and memory management, and file system operations.
Preferred qualifications, capabilities, and skills
  • Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
  • Ability to initiate and implement ideas to solve business problems
  • Practical experience of one or more of the following:
    • building, supporting and troubleshooting JVM based applications, including experience with tools such as JConsole, or VisualVM.
    • use and support of SQL and in-memory database technologies.
    • building and maintaining CI/CD pipelines using modern tools such as Github Actions, or Gitlab CI/CD.
    • observability and monitoring tools such as Prometheus, Grafana, or OpenTelemetry.
    • containers and orchestration platforms such as Docker, Kubernetes, or Amazon ECS,
    • cloud technologies such as AWS or GCP, including deployment, management, and optimization of cloud-based applications.
    • performance and chaos testing tools such as Gremlin, Chaos Mesh, and LitmusChaos.
  • Experience working in the financial/fintech industry.