Reliability Availability Serviceability Expert jobs at Nvidia in United States, Texas
Discover your perfect match with Expoint. Search for job opportunities as a Reliability Availability Serviceability Expert in United States, Texas and join the network of leading companies in the high tech industry, like Nvidia. Sign up now and find your dream job with Expoint
Company (1)
Job type
Job categories
Job title (1)
United States
Texas
City
12 jobs found
01.07.2025
N
Nvidia Senior Site Reliability Engineer - DGX Cloud United States, Texas
Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting. Engage in and improve the...
Develop software solutions to ensure reliability and operability of large-scale systems supporting machine-critical use cases. Gain a deep understanding of our system operations, scalability, interactions, and failures to identify improvement...
The team will provide their services 24/7 with a follow-the-sun environment which will span continents. You will report directly to a manager in the United States. Each team member will...
Recruit, develop, and inspire a team of Site Reliability Engineers, fostering a strong culture of collaboration, ownership, and technical excellence. Provide mentorship, guidance, and career development opportunities to help your...
Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting. Engage in and improve the...
Design, deploy and support large-scale, distributed GPU clusters to run high-performance AI and machine learning workloads. Continuously improve infrastructure provisioning, management, and monitoring through automation. Ensure the highest level of...
Develop Software on Pre-Sienvironments(Simulation/Emulation). Own and drive CUDA enablement for new Silicon and Architecture. Work with SW, HW and relevant teams to develop, stabilize and productize CUDA features for new...
Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting. Engage in and improve the...
Find your dream job in the high tech industry with Expoint. With our platform you can easily search for Reliability Availability Serviceability Expert opportunities at Nvidia in United States, Texas. Whether you're seeking a new challenge or looking to work with a specific organization in a specific role, Expoint makes it easy to find your perfect job match. Connect with top companies in your desired area and advance your career in the high tech field. Sign up today and take the next step in your career journey with Expoint.