Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

IBM Technical Architect Azure Site Reliability 
India, Karnataka, Bengaluru 
361566640

15.07.2024

In this role, you’ll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology.

Your Role and Responsibilities
The Site Reliability Engineer is a critical role in Cloud based projects. An SRE works with the development squads to build platform & infrastructure management/provisioning automation and service monitoring using the same methods used in software development to support application development. SREs create a bridge between development and operations by applying a software engineering mindset to system administration topics. They split their time between operations/on-call duties and developing systems and software that help increase site reliability and performance.
Required Technical and Professional Expertise

  • Overall experience of 12+ years
  • Be the subject matter expert in all Azure cloud services related matter. Strong working experience with Azure Platform and Azure DevOps
  • Define and establish SLOs and SLIs for critical services to ensure they meet performance and reliability targets. Respond to and mitigate incidents affecting system availability or performance
  • Participate in post-incident reviews to identify root causes and implement preventative measures.
  • Monitor system performance and set up alerts to notify them of potential problems and investigate alerts and take corrective action as needed.


Preferred Technical and Professional Expertise

  • Develop and maintain tools and automation scripts for deployment, configuration management, monitoring, alerting, and incident response.
  • Identify and address performance bottlenecks in the cloud infrastructure and applications.
  • Design and implement strategies for fault tolerance and disaster recovery to ensure system resiliency.