Service Reliability Operations Administrator jobs at Nvidia
Advance your career in high tech with Expoint. Discover job opportunities as a Service Reliability Operations Administrator and join top companies in the industry such as Nvidia. Sign up today and take control of your future.
Company (1)
Job type
Job categories
Job title (1)
United States
State
City
52 jobs found
Yesterday
N
Nvidia Senior Platform EngOps Engineer - Cluster Operations United States, California
Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand. Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor...
Develop comprehensive operational plans and de-risking strategies to ensure flawless technical execution of technical training events. Provide expert, hands-on technical leadership during live training events, managing deployments and rapidly resolving...
Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting. Engage in and improve the...
Architect, build, and evolve the scalable technology stack for global learner and instructor technical support. Lead the global operationalization of support systems, to ensure high availability, performance, and efficient resource...
Develop software solutions to ensure reliability and operability of large-scale systems supporting machine-critical use cases. Gain a deep understanding of our system operations, scalability, interactions, and failures to identify improvement...
Work closely with researchers and engineers in the team and track to-dos daily. Collect requirements, define priorities, understand critical roadblocks, and communicate effectively with team leads. Coordinate with lab ops...
Define and operationalize event creative system with the creative director for corporate events including look and feel, templates, guidelines, and training. Closely align with internal partners to seek out details...
Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand. Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor...
Discover your dream career in the high tech industry with Expoint. Our platform offers a wide range of Service Reliability Operations Administrator jobs opportunities, giving you access to the best companies in the field, like Nvidia. With our easy-to-use search engine, you can quickly find the right job for you and connect with top companies. No more endless scrolling through countless job boards, with Expoint you can focus on finding your perfect match. Sign up today and follow your dreams in the high tech industry with Expoint.