Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Microsoft Principal Site Reliability Engineer 
Austria, Vienna, Vienna 
221448218

25.06.2024

As a Principal Site Reliability Engineer, you will be an integral member of a team that is working to empower clinicians to achieve more with groundbreaking healthcare-oriented copilots and provide a secure, scalable, reliable solution. The ideal candidate will be excited about waking up every morning to apply their skills in automation, CI/CD, and IAC to develop and deploy new technologies and experiences centered around driving positive healthcare outcomes.

Minimum Qualifications:

· Multi-year technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

· A Bachelor’s Degree in Computer Science, Information Technology, or related field

· Deep knowledge of the Azure Cloud

· Deep knowledge in Azure Kubernetes Service (AKS)

· Ability to technically lead projects and mentor less senior team members

Language Qualifications:

English Language: fluent in reading, writing and speaking.

The salary for this role is starting from 114,700 euros per annum depending on experience.

Responsibilities

design components carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability.

Responsibilities include:

  • Demonstrates expertise in distributed systems design, interactions between cloud technology layers and components, common dependencies at scale, and the code that defines infrastructures. Can identify and recommend configurations optimal of cloud technology solutions and modify the code base that defines systems or cloud technologies to improve the reliability and operability of supported products with minimal guidance from other engineers.
  • Develops an understanding of the code, features, and operations of specific products at scale as required to contribute to incremental improvements in product availability, reliability, efficiency, observability, and/or performance; participates in on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those products.
  • Researches and maintains an awareness in industry trends, advances in distributed systems and cloud technologies, new tools, and/or processes for maintaining and improving product availability, reliability, efficiency, observability, and/or performance. Contributes to the implementation of new solutions within their team by identifying ways they can be applied to solve persistent problems.
  • Leverages technical expertise in large scale distributed systems and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or code to improve the availability, reliability, efficiency, observability, and performance of product components or features supported by their team.
  • Independently develops code or scripts that automate the performance of repetitive and easily scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale.
  • Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components and features; proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams.