Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Bank Of America Site Reliability Engineer Enterprise Cloud Platforms Global Technology 
Australia, New South Wales, Sydney 
78791808

01.04.2025

Job Description:

We are seeking Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.

Position Summary

  • Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA’s External Cloud Platform
  • Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
  • Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
  • Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
  • Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
  • Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
  • Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
  • Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
  • Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
  • Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.

Required Skills:

  • 7 years of combined experience in either SRE, software development, or infrastructure engineering (4 years with an advanced degree in Computer Science or related technical field).
  • 3+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
  • Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP’s like AWS, Azure or GCP.
  • Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
  • Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
  • Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
  • Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
  • Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
  • Advanced understanding of Linux & Windows operating systems including shell scripting
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.

Desired Skills

  • Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
  • Proficiency in creating automation using Python, Terraform, or Ansible
  • Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
  • Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
  • Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
  • Understanding of cost management, inventory management, FinOps model