מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Nvidia Senior DevOps Automation Engineer Fabric Networking - GPU
United States, Texas
191083796

13.05.2025

שיתוף

US, CA, Santa Clara

US, Remote

What You’ll Be Doing:

Drive robust CI/CD workflows to support continuous integration, build, and deployment processes across large-scale environments.
Streamline and enhance release management through strategic automation, orchestration, and intelligent dependency handling.
Improve development velocity by decoupling applications and enabling independent release cadences
Design and develop automation tools for deploying, provisioning, and maintaining large GPU clusters interconnected via NVLink and InfiniBand.
Implement modern DevOps technologies to automate software updates, perform system maintenance, and monitor cluster health and availability.
Own and resolve daily operational issues in GPU clusters, ensuring high availability and performance through proactive troubleshooting.
Manage seamless software and firmware rollouts and rollbacks across cluster infrastructure, minimizing disruptions.
Collaborate across dynamic engineering and product teams in multiple time zones to align cluster operations with project goals and timelines.

What We Need to See:

BS/MS in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
5+ years of experience managing clusters, servers, and networking infrastructure.
Strong scripting and automation skills with Ansible, Python, and Shell.
Proven experience building enterprise-grade CI/CD pipelines.
Understanding of modern application design patterns, including strategies for migrating and decoupling legacy code.
Solid understanding of Linux systems, networking, and distributed system design.
Strong cross-functional communication and collaboration skills.

Ways to Stand Out from the Crowd:

Experience with Slurm or similar workload/resource managers.
Hands-on experience with NVIDIA DGX systems and GPU-based compute clusters.
Familiarity with building metrics and alerting systems (e.g., Prometheus, Grafana). Demonstrated leadership in DevOps process improvement and team productivity initiatives.

You will also be eligible for equity and .

משרות נוספות שיכולות לעניין אותך

Nvidia Senior DevOps Automation Engineer Fabric Networking - GPU Israel, Tel Aviv District, Tel Aviv-Yafo

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח