Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

JPMorgan Site Reliability Engineer III - AWS 
United States, Texas, Plano 
565100254

26.06.2024

Job responsibilities

  • Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
  • Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
  • Leads initiatives to improve the reliability and stability of web Hosting platforms using data-driven analytics to improve service levels
  • Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
  • Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
  • Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
  • Provides comprehensive and ongoing guidance, tools, and solutions to support the firms’ growth
  • Works toward becoming an expert on the applications and platforms under your influence while understanding their interdependencies and limitations
  • Documents and shares knowledge within your organization via internal forums and communities of practice

Required qualifications, capabilities, and skills

  • Formal training or certification on site reliability engineering concepts and 3+ years applied experience.
  • AWS Exposure (Understanding and working experience in AWS applications, and understanding of resiliency, scalability, observability, monitoring etc,)
  • Experience in provisioning AWS infrastructure through Terraform
  • Experience as SRE in complex and mission critical applications involving multitude of components of varying technical generations
  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
  • Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform
  • Advanced knowledge and experience in observability, monitoring, alerting, and telemetry collection using tools such as Cloudwatch, Grafana, Dynatrace, Prometheus, Splunk, etc.
  • Fluency in at least one programming language such as (e.g., Python, Terraform, Ansible, Java Spring Boot, Shell Scripting, .Net, etc.)
  • Strong communication skills with ability to mentor and educate others on site reliability principles and practices
  • Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
  • Drive to self-educate and evaluate new technology
Preferred qualifications, capabilities, and skills
  • Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
  • Ability to initiate and implement ideas to solve business problems