Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Site Reliability Engineer 
Romania, Bucharest 
81438933

17.07.2024

As a Site Reliability Engineer, you’ll work with a breadth of partners across Microsoft including developers in service teams, hardware engineers, datacenter technicians, supply chain managers, and business leaders to rapidly debug and resolve issues delaying this carefully orchestrated buildout sequence. You’ll drive continuous improvements with these teams to prevent repeats and address common classes of issues across the Azure software stack through design reviews and problem management.This opportunity will enable you to learn unparalleled system-wide knowledge of how the Azure cloud is built and maintained. The contacts you make with experts will enable you to deep dive on services and new technologies and partner for improvements. You’ll be stretched to automate mitigations tactically and strategically analyze data to identify problem areas for driving prioritization.


Required Qualifications:

  • Technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Technical experience in software engineering, network engineering
    • OR systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology
    • OR related field AND technical experience in software engineering, network engineering
    • OR systems administration
Responsibilities
  • Develops a foundational understanding of distributed systems design, interactions between cloud technology layers and components, basic dependencies at scale, and the code that defines infrastructures. Can contribute to the code base the defines components or features of systems or cloud technologies to improve the reliability and operability of supported products, with direction with other engineers.
  • Supports ongoing engagements with product engineering teams by participating in code/design reviews, regular meetings, on-call rotations, and incident responses throughout product development and operations cycles; draws insights from engagements with product engineering teams and basic analyses of telemetry data to propose potential improvements to code and designs for a defined set of product components or features with guidance from other engineers.
  • Implements simple configuration and data changes across a predefined range of product components or features with guidance from other engineers to develop an understanding of how configurations, binaries, and data can be managed using code, tooling, and automation.
  • Develops an understanding of how to safely and reliably manage changes in production by using existing tools and automation to enable product engineering teams implement changes across a defined range of components or features, with direction from other engineers.
  • Uses existing tools to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components or features with guidance from other engineers. Suggests potential solutions to resolve and prevent recurring issues and brings them to the attention of other engineers or team leads.
  • Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting basic issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams or owners to major customer impacting issues and escalates the resolution of complex issues and/or those affecting multiple components or features to other engineers as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings