Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Apple Site Reliability Engineer Enterprise Technology Services 
United States, West Virginia 
373075803

04.09.2025
  • Architect Scalable Infrastructure: Design, evolve, and review highly reliable, performant, and cost-efficient cloud-native and hybrid infrastructure using IaC, containers, and micro services principles.
  • Support Cryptographic Systems at Scale
Design and operationalize scalable, secure integrations with Hardware Security Modules (HSMs) for sensitive workloads, key management, and cryptographic operations.
  • Drive SRE Best Practices: Define and implement service-level indicators (SLIs), objectives (SLOs), and agreements (SLAs) to guide engineering teams towards reliability and observability goals.
  • Incident Architecture & Prevention: Serve as a technical lead during major incidents. Partner with security and platform teams to conduct deep post-incident reviews, drive systemic improvements, and establish preventive architectural controls.
  • Sytem Design & Tooling: Build and maintain reusable tooling, automation frameworks, and reliability platforms (observability, alerting, chaos testing, auto-scaling, failover).
  • Reliability as Code: Champion resilience engineering via automation pipelines, CI/CD integrations, canary releases, and chaos engineering principles.
  • Multi-Cloud and Hybrid Systems: Design, assess, and guide architecture decisions across AWS, GCP, AliCloud, and on-premises infrastructure. Ensure consistency, interoperability, and regulatory compliance.
  • Security & Compliance: Ensure architectural patterns are aligned with security standards, compliance requirements, and audit readiness.
  • 7+ years of experience in SRE, DevOps, or Infrastructure Engineering roles, with 2+ years in an architectural or principal engineering capacity.
  • Deep expertise in cloud infrastructure (AWS, GCP, or AliCloud) and container orchestration (Kubernetes, EKS).
  • Proven experience with Infrastructure as Code (Terraform, Pulumi, CloudFormation).
  • Strong understanding of distributed systems, networking, and systems design at scale.
  • Proficiency in at least one programming or scripting language (Python, Go, Bash, or similar).
  • Experience designing observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, ELK, etc.).
  • Solid background in CI/CD tools and modern deployment strategies (ArgoCD, Spinnaker, GitOps).
  • Familiarity with security best practices in cloud and containerized environments.
  • Familiarity with HSMs and crypto operations at scale will be a plus.