The point where experts and best companies meet
Share
What you'll be doing:
The person will be part of the NVIDIA NMX team that is building the SaaS platform and the on-premise solution for network management and telemetry.
The responsibility specifically is for Devops, infrastructure and Site Reliability Engineering (SRE) requirements for NMX.
Focus on efficiency by automating repetitive workflows.
Working on microservices based architecture.
Deploying and troubleshooting non-disruptive cloud operations with an emphasis on secure production infrastructure.
Continuous evaluation of existing system and driving improvements.
Managing deployment/upgrade for Operating Systems, Kubernetes(k8s) clusters and/or or other orchestration tools.
Day to day support for engineering activities with CI/CD tools like git, jenkins.
Efficiently multi-tasking on the different tracks to efficiently address evolving priorities .
What we need to see:
5+ years of experience in complex microservices basedarchitectures
Highly skilled in Kubernetes and Docker
Having good programing background in one high level language like Golang or python or equivalent experience
Strong knowledge of NoSQL DB (e.g. MongoDB), Kafka/Kafka Streams.
Experienced with modern deployment architecture for non-disruptive cloud operations including blue green and canary rollouts
Infrastructure as code (IaC) skills in frameworks like Ansible & Terraform
Expert in AWS
Knows best practices and discipline of managing and monitoring a highly available and secure production infrastructure
Ways to stand out from the crowd:
Skills in Linux/Unix Administration
Experience with Prometheus/Grafana.
Experience with APM tools like Dynatrace, Datadog, AppDynamics, New Relic, etc.
Implemented highly scalable log aggregation systems in past using ELK stack or similar
Implemented robust metrics collection and alertinginfrastructure
These jobs might be a good fit