Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Apple Staff Site Reliability Engineer Kubernetes ASE 
United States, Texas, Austin 
304227964

19.12.2024
Description
This role goes beyond traditional SRE work. You'll not only keep our systems running smoothly but also collaborate closely with developers and architects. Together, you'll design and implement solutions for improved stability, security,and scalability.
Minimum Qualifications
  • Kubernetes Expertise: Deep understanding of Kubernetes architecture, components, and best practices. Proficiency in managing Kubernetes clusters, deploying applications, and automating workflows using tools like Helm and Kustomize.
  • Cloud Platforms: Experience with major public cloud providers and their cloud-native services. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.
  • SRE Principles: Adherence to SRE principles, including monitoring, alerting, error budgets, fault analysis, and automation. Strong focus on reliability, availability, and performance.
  • Telemetry and Observability: Expertise in implementing and coordinating telemetry using tools like Splunk, Grafana, and Prometheus. Ability to analyze and troubleshoot complex system issues.
  • Programming: Proficiency in GoLang for developing automation scripts, tools, and custom applications.
  • Collaboration: Excellent interpersonal and communication skills. Ability to work effectively in cross-functional teams and foster a collaborative environment.
  • BS or MS in Computer Science or equivalent proven experience
Preferred Qualifications
  • Production & Non-Production Environments: Operate, monitor, and prioritize tasks across all production and non-production environments, demonstrating strong operational focus.
  • Innovative Problem Solver: Design, build, and implement innovative software solutions to address existing challenges and proactively anticipate future needs.
  • Documentation & Collaboration: Create clear alert handling procedures and runbooks, ensuring knowledge transfer and collaboration within and between SRE teams.
  • Automation Champion: Automate service deployment and orchestration in the cloud environment, as well as other routine processes, to streamline operations.
  • Resilience & Growth: Actively participate in capability planning, scale testing, and disaster recovery exercises, ensuring our systems remain resilient.
  • Team Player: Foster strong relationships and provide support to partner teams like engineering, QA, and program management.
Additional Requirements
  • Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.