Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Principal Site Reliability Engineering Manager 
United Kingdom 
808933960

30.07.2024

Every minute of every day, customers stake their entire business and reputation on the Microsoft Cloud. The Azure Customer Experience (CXP) team believes that when we meet our high standards for quality and reliability, our customers win. If we falter, our customers fail their end-customers. Our vision is to turn Microsoft Cloud customers into fans.

problem-solvers. We orchestrate deep engagements in areas like incident management, support and enablement. We analyze and amplify those customer voices, both within our own team, and across the Cloud + AI team, bringing the customer connection to the Quality vision for Azure. We innovate ways to scale what we learn across our customer base. Diversity and inclusion are central to who we are, how we work, and what we enable our customers to achieve. We know that empowering our customers starts with empowering our team to show up authentically, work in ways that are best for them, and achieve their career goals.

Come join us and surround yourself with people who are passionate about cloud computing and believe that extraordinary support is critical to customer success.

– a global Azure Engineering Support organization (part of Azure Customer Experience group) that is customer-obsessed, and support engaged, with an engineering mindset.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

SRE Manager, youfor ahighly sing directly to lifting the overall platform reliability and security by working horizontally across the customer hardware and Azure.This role enables you to contribute to the understanding of Site Reliability Engineering and improving our customer experience.

You will work closely and in partnership with theOperations Lead and theManagement Lead.

imum Qualification

  • Bachelor's Degree AND extensiveexperience inITexperience in areas including:
  • live production systems(ideally missioncritical)
  • Implemented and run Site Reliability Engineering teams (SRE)
  • Have experience building high-performing operations teams
  • Customer relationship management skills
  • OR equivalent experience.
  • extensivepeople management experience.
  • extensive experiencemanagingoperational teamsand/or engineering teams with operational responsibilities.

Additional or Preferred Qualifications

  • Operations Leadership:Experiencein a leadership role intechnicalsupport for mission criticalsolutions.
  • Experience with HPC infrastructure and workflows
  • Incident Management Experience: Experiencein leading incident and crisis management forlarge mission critical IT Solutions.
  • Site Reliability Engineering:Leadershipwith modern site reliability practices.Run teams and services thatimprovedreliability of systems, automated remediation of issues, or improvedscalability.
  • Cloud Experience:Exposure to Cloud, SaaS, and virtualization concepts and performance concerns.
  • Operational Excellence:Experience of driving Operational Excellencein operating large complex mission critical solutions.Significant experience in delivering large scale operational services in acomplex changeenvironment,organisationalredesign and quality improvementprogrammes.
  • Excellent Communication: Must have the ability to empathize with customers and convey confidence. Able to explain highly technical issues to varied audiences. Able to prioritize and advocate customer’s needs to the proper channels. Take ownership and work towards a resolution.
  • Customer Obsession: Passion for customers and focus on delivering the right customer experience.
  • Growth Mindset: Openness and ability to learn new skills and technologies in a fast-paced environment.

Responsibilities

most complex customers.Principal SRE Manager will be responsible forthe following:

  • SRE Leadership:
  • T
  • Creation of ServiceLevel Indicators (SLIs) for customer system
  • The elimination of toil and driving automation throughout operations
  • Ensuring continuous improvement through a live-site first culture
  • Partner with the Ops Lead and
  • Relationship:
  • Develop and maintain strong relationships with thecustomer’skey stakeholders.
  • Leverage your skills such as active listening, problem solving, and being transparent to address difficult customer situations requiring you to set and manage customer expectations while maintaining a trusted customer relationship.
  • Acts as a voice of the customer and leverage the customers feedback to provide input to our product team.
  • Be the voice of the customer within the Azure Engineering community.
  • :
  • Lead the service reliability engineering team as they transform our customer from traditional service management to modern Service Reliability Engineering.
  • Leadthe Technical Service Owners for key solutions, ensuring technical excellence and continuousenhancements ofsolutions.
  • Manage 24x7SREteam with on-call rotation able to respond to customer most critical incident in 5 min or less.
  • Build a team culture that thrives on customer obsessionand delivery excellence, where team members go above and beyond their immediate issue at hand to delight the customer and can predict and resolve the next issue before the customer reports it.
  • isdelighted with their Azure cloud experience.
  • Drive up-leveling of team’s technical skills.
  • Leverage resources to help employees develop skills and support their career interests.
  • Remove barriers to agility to enable the team to shift priorities quickly without losing productivity.
  • Create an inclusive work environment where every employee can effectively engage and wants to be part of the team. Provide ongoing feedback that helps direct reports improve their performance. Promote a positive environment across the organization by modeling behavior that promotes good morale.
  • Other duties as assigned.

As this role is a key operational position, working out of normal office hours can be anticipated (e.g. operational escalations).

  • This role is based in theUK. Regular travelto the customer sitein south-westEnglandis required(c. 2-3 days per week at first, reducing to a few times per month as an estimation)
  • Travel between Microsoft offices in Reading & Londonshould be expected when required.