Design and implement state-of-the-art GPU compute clusters. Optimize cluster operations for maximum reliability, efficiency, and performance. Drive foundational improvements and automation to enhance researcher productivity. Troubleshoot, diagnose, and root cause...