The point where experts and best companies meet
Share
What you’ll be doing:
As an SRE, you are responsible for:
Providing scalable and robust service oriented infrastructure automation, monitoring and analytics solutions for NVIDIA's on-prem and cloud based GPU infrastructure.
You will own the whole life cycle of new tools and services - from requirements gathering, to design documentation, validation and deployment.
Provide customer support on a rotation basis.
What we need to see:
Minimum of 3 years Experience in automating and handling large-scale distributed system software deployments in on-prem/cloud environments.
Proficiency in any language - Go/Python /Perl/C++/Java/C.
Strong command on terraform, Kubernetes and cloud infra administration.
Excellent debugging and troubleshooting skills.
Excellent interpersonal, and written communication skills.
B.E in Computer Science or a related technical field involving coding (e.g., physics or mathematics)
Ways to stand out from the crowd:
Ability to decompose complex requirements into simple tasks and reuse available solutions to implement most of those.
Unit testing and benchmarking are an integral part of your code.
Ability to reason and choose the best possible algorithm to meet scaling and availability challenges.
These jobs might be a good fit