Finding the best job has never been easier
Share
What you'll be doing:
Architect the scaling operation in our data centers. Deploy and Support end-to-end container management solution with Kubernetes, Docker, Containerd.
Design solutions with service discovery, networking, monitoring, logging, scheduling in Kubernetes
You will be working on challenging problems in area of infrastructure such as job scheduling, resource management and automated recovery.
Use your depth in algorithms and system software background!
Work in teams to deploy new data center infrastructure.
Plan and implement critical metrics tracking using various data analytics mining methods and dashboards.
Reuse AI techniques to extract useful signals about machines and jobs from the data generated!
Take part in prototyping, crafting and developing cloud infrastructure for Nvidia.
Develop various device plugins / Operator on Kubernetes
Build complete solutions including Metrics, Alert and Storage Services
You want to dig more data, analyze much more, apply deep learning algorithms / machine learn to improve theperformance/predictabilityof the system
What we need to see:
Strong object-oriented programming background in python/Golang/java and/or relevant scripting languages
Background in developing large scale cloud infrastructure applications
Knowledge of various technologies (Kubernetes, Message broker)
Experience with Relational Databases such as MySQL, NoSQL DBs such as Elastic Search
Proficient with configuration management tools like Ansible, Chef, Puppet and strong experience with Jenkins and/or other CI systems.
Ability to collaborate across multiple team and across people working in different time zones.
Experience withanalytics/visualizationtools like Kibana, Grafana, Splunk etc. and experience with monitoring systems such as Zabbix and/or Nagios is nice to have
BS/MS in Computer Science or Computer Engineering or equivalent experience
5+ years of proven experience.
Ways to stand out from the crowd:
Real world experience with distributed systems, containers, and Kubernetes API.
Previous experience with DevOps teams
You have worked on computer algorithms and demonstrated ability to choose the best possible algorithms to nail sophisticated problems
Able to divide sophisticated problems into simple sub problems and then reuse available solutions to implement the solutions.
Experience in design, implementation and deployment of major infrastructure features across multiple servers in incremental rollout mode
These jobs might be a good fit