Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

JPMorgan Senior Lead Site Reliability Engineer 
United States, New Jersey, Jersey City 
609074320

Today

Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.

Job responsibilities

  • Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance

  • Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues

  • Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team

  • Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt

  • Evolves and debug critical components of applications and platforms

  • Employ AI-driven solutions to streamline processes and enhance operational efficiency.

  • Utilize data-driven analytics and AI technologies to automate detection, diagnosis, resolution processes, elevate service levels and drive continuous improvement.

  • Engage stakeholders to establish realistic service level objectives and error budgets, ensuring alignment with customer expectations.

  • Exhibit advanced technical proficiency in one or more domains, proactively addressing technology-related bottlenecks.

  • Serve as the primary contact during major incidents, demonstrating the ability to swiftly identify and resolve issues to prevent financial losses.

  • Provides comprehensive and ongoing guidance, tools, and solutions to support the firms’ growth

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and 5+ years applied experience

  • Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform

  • Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.

  • Fluency in at least one programming language such as (e.g., Python, Java Spring Boot,.Net, etc.)

  • Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines

  • Ability to communicate data-based solutions with complex reporting and visualization methods

  • Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)

  • Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)

  • AWS Cloud experience across multiple areas

  • Recognized as an active contributor of the engineering community

  • Continues to expand network and leads evaluation sessions with vendors to see how offerings can fit into the firm’s strategy

Preferred qualifications, capabilities, and skills

  • Experience in banking / financial domain is preferred

  • AWS Cloud solution architect certification preferred

  • Manage and optimize various types of databases, including relational, NoSQL databases.

  • Experience with game days, chaos experiments, or failure-mode analysis to improve service robustness.

  • A background in mentoring engineers or leading technical knowledge-sharing, especially around AI and SRE best practices.