Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Bank Of America Cloud Senior Site Reliability Engineer 
United States, New Jersey, Jersey City 
747729998

17.05.2024

Senior Site Reliability Engineering, Hybrid Cloud Container Platform, Enterprise Cloud Platforms

Our Cloud Service Reliability Engineers (cSREs) ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7.

The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/off-prem) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.

Position Summary:

  • Responsible for reliability and support of Container PaaS Platform on-prem/off-prem (Azure /AWS /Google)

  • Monitor and troubleshoot Container PaaS platform (Openshift) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.

  • Perform deep dives into systemic and latent reliability issues, Incident management, problem management

  • Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.

  • Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.

  • Identify and drive opportunities to improve automation for the PaaS services; scope and create automation for deployment, management, and visibility of our services.

  • Evaluating and automating the scaling and capacity requirements within PaaS environments

  • Partner with risk, and compliance teams to bring visibility and implement right controls and policies in the PaaS Platform

  • Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams

  • Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams

  • Participate in 24x7 on-call coverage follow the sun model

Required Skills:

  • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.

  • Minimum 8+ years of hands-on experience supporting Kubernetes /Openshift / Container PaaS platform

  • Experience with Python, Ansible and shell scripting

  • Kubernetes /Openshift /Terraform certifications are a plus

  • Strong experience in major services related to Compute, Storage, Network and Security

  • Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics

  • Strong understanding and background of working with a complex Active Directory and IAM controls

  • Advanced knowledge of DNS, DHCP, Kerberos and Windows Authentication

  • Experience with CI/CD tools git /Jenkins, GitOps model

  • Excellent understanding of Linux /Windows operating systems administration

  • Systematic problem-solving approach, sense of ownership and drive

  • Ability to juggle competing priorities and adapt to changes in project scope.

  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.

  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

Desired Job Skills:

  • Experience in Openshift, managed Kubernetes services such as AKS, EKS, or GKE

  • Experience in Terraform, ArgoCD, Tekton, and K-native technologies

  • Experience in agile deployment methodologies (GitOps)

  • Knowledge of various container runtimes

  • Familiarity with the operator deployment pattern.

  • Experience working in a highly available multi-datacenter environment

  • Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.

  • Understanding of cost management, inventory management, FinOps model

1st shift (United States of America)