Share
What you'll be doing:
Be part of the NVIDIA AIR team building the SaaS/IaaS platform for the digital twin of AI data centers.
Handle DevOps and Network Engineering requirements for AIR.
Focus on efficiency by automating repetitive workflows.
Work on microservices-based architecture.
Deploy and troubleshoot non-disruptive cloud operations with an emphasis on secure production infrastructure.
Help debug and improve network operations across the infrastructure.
Manage deployment/upgrades for Operating Systems, Kubernetes (k8s) clusters, and other orchestration tools.
Provide day-to-day support for engineering activities with CI/CD tools like Git and Jenkins.
Efficiently multi-task across different tracks to address evolving priorities.
What we need to see:
BS degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
4+ years of experience in complex microservices-based architectures.
Expertise in automation with hands-on skills in Ansible, Python, and Shell Scripting.
Proficiency in Kubernetes, Docker, QEMU, and Libvirt.
Experience in IaaS environments, including deploying, configuring, and administering Linux-based bare metal servers.
Deep experience in infrastructure engineering, focused on managing and monitoring a highly available production infrastructure.
Strong networking background (TCP/UDP, VLANs, routing, VPNs).
Experience with modern deployment architecture for non-disruptive cloud operations, including blue-green and canary rollouts.
Deep knowledge of AWS.
Ways to stand out from the crowd:
Background with AWS deploying multifaceted, load-balanced, and highly available workloads.
Experience with relational databases (MySQL) and SQL.
Proficiency in debugging network issues in both infrastructure and SDN.
Experience with Prometheus/Grafana.
Implemented robust metrics collection and alerting infrastructure.
You will also be eligible for equity and .
These jobs might be a good fit