Provide infrastructure support for Redhat OpenShift, specialist in NVIDIA, containers, and Kubernetes skills.
Perform daily system monitoring, verifying the integrity and availability of all hardware, resources, systems, and key processes, reviewing system and application logs, and verifying completion of scheduled jobs.
Provide support per request from various constituencies. Investigate and troubleshoot issues.
Repair and recover from hardware or software failures. Coordinate and communicate with impacted infrastructure.
Perform preventative maintenance (and upgrade, as required) on devices, and related peripherals to meet IT specifications.
Essential Requirements:
12+ Years Industry experience, experience in Redhat OpenShift, containers and Kubernetes Administration.
Hands on knowledge on NVIDIA AI Enterprise, NVIDIA GPU & Network Operations
Knowledge on NVIDIA base command manager & Cluster manager
Knowledge on Network Administration with NVIDIA ONYX Switch System
Desirable Requirements
Knowledge on Observability & log collection (Prometheus and Grafana)