RESPONSIBILITIES
As a Software Engineer in one of our Azure SRE teams, you will be responsible for improving the reliability of key Azure products.
The Azure SRE key focus areas are:
- Defining system reliability goals through Service Level Objectives (SLOs). Enhancing production posture with targeted improvements in observability and operability (telemetry, alerting, incident/change management, safe deployment practices).
- Building reusable automation and processes that help multiple teams meet their reliability goals. Influencing product architecture and roadmaps to ensure customer-experienced reliability is a core design principle.
- Contributing directly to product code to achieve reliability outcomes. Leveraging AI to proactively detect anomalies, predict incidents, and automate operational workflows - scaling reliability efforts across complex systems.
- We are looking for engineers passionate about the above areas who are also interested in:
- Providing technical leadership across multiple Azure teams. Mentoring others on SRE principles, practices, and tools as well as AI usage to boost software development productivity.
- Designing and developing large-scale distributed software services and solutions. Delivering “best-in-class” engineering by ensuring services are modular, secure, reliable, testable, diagnosable, observable, and reusable.
- Collaborating with internal and external partners to support team goals. Balancing pragmatism with vision—driving continuous improvements in process and codebase. Building automation to prevent or remediate service issues before they impact users.
- Driving innovation in large-scale operations by applying cutting-edge AI tools and techniques to reduce operational toil and scale reliability engineering across complex systems. Gaining a working understanding of Microsoft businesses and contributing to cohesive, end-to-end user experiences.