You will be responsible for the end-to-end infrastructure lifecycle supporting large-scale datacenter and HPC environments across multiple continents. You will:Design and operate scalable Slurm-based HPC clusters, distributed globally across 2,000+ nodes.Lead infrastructure automation for provisioning, monitoring, and configuration management of compute and enterprise services.Manage and tune high-availability services and support site-aware routing, load balancing, and DNS-based traffic distribution.Serve as expert in Active Directory integrations, trust relationships, replication latency troubleshooting, and directory service hardening.Architect virtual based solutions where appropriate to support auxiliary services, container workloads, and hybrid edge compute nodes.Oversee secure data replication strategies between sites, integrating load-balancer VIPs and geo-distributed failover configurations.Provide root-cause analysis for performance bottlenecks, host instability, or data inconsistencies across global platforms.Work closely with platform owners, security teams, and datacenter engineers to evolve infrastructure towards zero-touch, self-healing architecture.