Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Microsoft Site Reliability Engineer 
Czechia, Prague, Prague 
394481772

03.04.2024
Qualifications

Bachelor's or Master's degree in Computer Science, Data Science, AI, or a related field.Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, microservices, and so on.
Associated troubleshooting skills, including the ability to follow RPC (Remote Procedure Call) call-chains across arbitrary network steps. Consequent understanding of monitoring in distributed systems.
Practical experience running large scale online systems is always an advantage.

and ordinances.  We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need

Responsibilities

Technical Knowledge and Domain-Specific Expertise

Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies; identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance.

Applies advanced statistical and machine learning techniques to analyze large datasets and extract meaningful insights.

Has experience working with all service aspects of high throughput and multi-tenant services, ability to understand and design workflows carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability.

Driving Operational Excellence

Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale; reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization.


Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale.

Mentors and coaches less experienced engineers to help them identify and propose relevant solutions.