The point where experts and best companies meet
Share
What you’ll be doing:
Architect the scaling operation in our data centers. Deploy and Support end-to-end container management solution with Kubernetes, Docker, containerd. Design solutions with service discovery, networking, monitoring, logging, scheduling in Kubernetes
Setup and Manage end to end Jenkins instances - tools, plugins, nodes, user management, back up, restore, monitoring, etc. Design and develop tools needed for automating maintenance of 10000+ hosts with only 10 support engineers.
Use your depth in algorithms and system software background!
Work in teams to deploy new data center infrastructure.
Plan and implement critical metrics tracking using various data analytics mining methods and dashboards.
Reuse AI techniques to extract useful signals about machines and jobs from the data generated!
Take part in prototyping, crafting and developing cloud infrastructure for Nvidia.
What we need to see:
Strong Kubernetes understanding and background especially on-premises setup and extensive experience with Kubernetes components & subsystems.
Experience of maintaining large scale cloud/on-prim infrastructure applications using Kubernetes
Proven programming background in python/Golang/java and/or relevant scripting languages
Excellent debugging and analytical skills and experience in Databases both SQL (MySQL ) and NoSQL (Elastic Search /MongoDB)
Proficient with configuration management tools like Ansible, Chef, Puppet and strong experience with Jenkins and/or other CI systems.
Hands-on experience with VMs, Dockers, Kubernetes Cluster.
Experience withanalytics/visualizationtools like Kibana, Grafana, Splunk etc. and experience with monitoring systems such as Zabbix and/or Nagios is nice to have
12+ years of proven experience
Bachelors or Master's Degree or equivalent experience in CS, Software Engineering, or related field.
Ways to stand out from the crowd:
Previous experience with DevOps teams
Thrives in a multi-tasking environment with constantly evolving priorities and documents work well
Outstanding collaboration skills across organizational boundaries, experience with using and improving data centers and with computer algorithms and ability to choose the best possible algorithms to meet the scaling challenge
Ability to divide complex problems into simple sub problems and then reuse available solutions to implement most of those
Ability to design simple systems that can work reliably without needing much support
These jobs might be a good fit