Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Apple Infrastructure DevOps Engineer
United States, Texas, Austin
543581568

12.05.2025

You will be responsible for the end-to-end infrastructure lifecycle supporting large-scale datacenter and HPC environments across multiple continents. You will:Design and operate scalable Slurm-based HPC clusters, distributed globally across 2,000+ nodes.Lead infrastructure automation for provisioning, monitoring, and configuration management of compute and enterprise services.Manage and tune high-availability services and support site-aware routing, load balancing, and DNS-based traffic distribution.Serve as expert in Active Directory integrations, trust relationships, replication latency troubleshooting, and directory service hardening.Architect virtual based solutions where appropriate to support auxiliary services, container workloads, and hybrid edge compute nodes.Oversee secure data replication strategies between sites, integrating load-balancer VIPs and geo-distributed failover configurations.Provide root-cause analysis for performance bottlenecks, host instability, or data inconsistencies across global platforms.Work closely with platform owners, security teams, and datacenter engineers to evolve infrastructure towards zero-touch, self-healing architecture.

A Bachelor’s degree in Computer Science with several years of relevant experience
Proven experience in a DevOps role in an enterprise environment with private and public cloud exposure.
Proven experience in a Systems Admin or Systems/IT support role in an enterprise environment.

7+ years of experience operating large-scale, production-grade datacenter or HPC environments (2,000+ nodes).
Expert-level Windows Server administration, including Active Directory, GPO, DNS, DHCP, and DFS for distributed enterprise environments.
Deep experience with RHEL/CentOS and infrastructure tuning for high-performance, low-latency workloads.
Advanced knowledge of global networking concepts: routing, DNS failover, site-aware load balancers, VIP configuration, and traffic shaping.
Strong hands-on experience with enterprise virtualization platforms (VMware vSphere/ESXi, HyperV) for production and edge workloads.
Proficient in infrastructure automation and scripting with PowerShell, Python, and Ansible.
Experience with InfiniBand fabrics and high-bandwidth data interconnects in compute environments.
Deep understanding of infrastructure observability using Prometheus, Grafana, Nagios, Splunk, or equivalent tools.
Proven success managing global replication services and multi-region compute/data platforms.
Excellent cross-functional communication and documentation skills, with the ability to influence and mentor across global teams.
Additional Requirements
Strong understanding of change management, CMDBs, and policy-based infrastructure enforcement.
Experience managing parallel storage systems (NetApp, BeeGFS, or Lustre) and integrating them with compute and replication workflows beneficial
Availability for emergency escalations and out-of-hours troubleshooting for priority systems.
Travel may be required to support infrastructure at remote datacenter or partner sites.
Experience supporting CAD / FEA engineering workloads , CI pipelines, and EDA is highly desirable.