Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Development Engineer Test 
United States, California 
767152952

02.07.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 30+ Days Ago
job requisition id

What you'll be doing:

  • Work with development teams on test plans for all layers of SW stack for cloud infrastructure, execution, reviews, failure analysis and assessing overall quality and risk. Work with customer PMs on software issues including technical feedback from OEMs and CSPs. Develop key KPIs to track execution and deploy process improvements to improve efficiency

  • Lead NVIDIA Cloud and Data Center bring up activities which will involve validation, reporting, working with engineering to debug issues, providing design input at times, adding coverage in different areas.

  • Design, develop and maintain CI/CD pipelines for continuous testing in cloud environments when needed.

  • Perform performance, scalability, and reliability testing of cloud services.

  • Implement and maintain test environments in cloud platforms such as AWS, Azure, or Google Cloud.

  • Supervise the infrastructure to alert on significant events, ensuring the highest level of system performance and reliability.

  • Work with various different partner teams to ensure availability of clusters to test on and take the lead in resolve all issues.

  • Working with teams to ensure quality of the cloud products getting delivered focusing on critical areas like security, storage, workloads, performance on latest SW and FW components.

What we need to see:

  • A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.

  • Experience with AI development tools used in creating test cases, automating test cases, code coverage, triaging.

  • 4+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.

  • 2+ years strong experience with cloud infrastructure platforms like AWS, Azure, Google, OCI Cloud.

  • Proficient in Unix/Linux and shell/python programming skills.

  • Hands-on experience with network, storage, security, cluster configuration and debugging, cloud infrastructure management tools like terraform, ansible.

  • Expertise in administering, operating, and configuring Kubernetes.

  • Experience in CI/CD tools such as Gitlab and Jenkins and the GitOps model.

  • Proficiency in various monitoring tools :Prometheus, Grafana, Cloudwatch, and Thanos.

  • Proficiency in debugging issues involving networks, DHCP, DNS, HTTP, Linux, and containers.

Ways to Stand Out from the Crowd:

  • Familiarity with "Bright Cluster manager" for managing and monitoring high performance computing.

  • Experience in writing automation for web application using tools like selenium, playwright.

You will also be eligible for equity and .