Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

JPMorgan Site Reliability Engineer AI/ML Platform 
United Kingdom, Scotland 
14686252

29.04.2025

Job responsibilities

  • Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
  • Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
  • Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
  • Design and implement solutions to enhance the reliability and scalability of AI/ML
  • Partner with product engineering teams to ensure the AI/ML systems are reliable and
    high performing.
  • Develop observability, security, automation and fin-ops tools and orchestration.
  • Build strong cross-functional relationships that foster engagements across the
    organization and deliver solutions to user problems.
  • Debug and solve issues in a production environment, identify root cause and
    remediate.
  • Participates in on-call rotations, incident management and escalation workflows.
  • Take full ownership of problems, develop solutions, and acquire new knowledge to
    complete the task.
  • Mentor and guide junior engineers.

Required qualifications, capabilities, and skills

  • Formal training or certification on Site Reliability Engineering concepts and applied experience
  • Expertise in SRE principles, reliability, scalability and performance of application and
    infrastructure.
  • Expertise in programming with Python and Infrastructure as Code, tools such as
    Terraform.
  • Experience working with distributed systems and cloud-native architecture in AWS.
  • Systematic problem-solving and troubleshooting skills in a complex system.
  • Excellent communication skills and ability to represent and present business and
    technical concepts to stakeholders.
  • Self-managed, self-motivated with strong sense of ownership, urgency, and drive
Preferred qualifications, capabilities, and skills
  • Prior experience working in AI, ML, or Data engineering.
  • Expertise in container orchestration/Kubernetes.
  • Prior experience developing Automation frameworks/AI Ops.
  • Prior experience building observability and telemetry tools.