Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia Data Center Test Development Architect 
United States, California 
753234115

24.06.2024

What you'll be doing:

  • Engage with product engineering teams to gain a comprehensive understanding of their infrastructure use cases.

  • Provide mentorship to SWQA teams on effectively testing at scale. Develop end to end test plans that exercise all layers of SW stacks for NVIDIA cloud-based infrastructure Lead NVIDIA Data Center bring up activities from SWQA perspective Develop sophisticated tooling to automate the build and deployment of microservices and infrastructure components, improving efficiency and productivity.

  • Reduce manual labor and increase operational efficiency through automation. Supervise the infrastructure to alert on significant events, ensuring the highest level of system performance and reliability.

  • Work closely with partners to understand their infrastructure needs and to ensure our testing encompass their use cases.

What we need to see:

  • A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.

  • 4+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.

  • 8+ years strong experience with cloud infrastructure platforms like AWS, Azure, or Google Cloud.

  • Hands-on experience with server platform, network, storage, cluster configuration and debugging.

  • Experience with platform telemetry, datacenter node lifecycle management/support including CPU/GPU workloads Proficiency in scripting languages such as Python. Expertise in administering, operating, and configuring Kubernetes and Envoy.

  • Validated experience in ContinuousIntegration/ContinuousDelivery (CI/CD) tools such as Gitlab and Jenkins and the GitOps model. Proficiency in various monitoring tools: Prometheus, Grafana, Cloudwatch, and Thanos.

  • Strong background in cloud security, Kubernetes security, and application security. Proficiency in debugging issues involving networks, DNS, HTTP, Linux, and containers.

  • Strong analytical and problem-solving skills, along with an ability to articulate what you know to others.

Ways to Stand Out from the Crowd:

You will also be eligible for equity and .