Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Apple Site Reliability Engineering SRE Manager iCloud 
United Kingdom, England, London 
925404440

06.06.2024
Description
As a Site Reliability Engineering Manager, responsibilities include:Manage staging and production environments with goal of maximizing availability Promote observability of systems for monitoring, alerting, and metrics reporting Advocate best practices of reliability engineering
Key Qualifications
  • Experience with large scale distributed systems, especially ML infrastructure and services including LLMs, Generative AI, and transformers
  • Demonstrable success leading engineering teams - ideally SRE or Production Engineering
  • Knowledge of core operating system principles, networking fundamentals, and systems management
  • Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
  • Experience with hiring and leading engineers
  • Professional experience in an engineering leadership position
Education & Experience
Bachelors or Masters degree in computer science or equivalent field.