Finding the best job has never been easier
Share
What you'll be doing:
Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.
Coordinate Storage Solutions and plan for growth.
Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
Actively connect with management regarding any problems with the equipment and propose resolution.
Plan, build and install/upgrade new systems that support NVIDIA DL Software
What we need to see:
You have a BA, BS, or MS in CS, EE, CE or equivalent experience
4+ years of previous experience deploying and administrating HPC clusters
Familiar with resource scheduling managers (Slurm (preferred), LSF, etc!
Proven track record to script in bash, Perl or python
Experience with containers (Docker, Singularity, LXC)
Deep understanding of operating systems, computer networks, and high-performance applications
Ability to work well with developers & test engineers
Hard-working dedication to provide quality in support for your users
Ways to stand out from the crowd:
Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker
Familiarity with GPU usage in Compute Cluster and Cuda
Experience with mobile and embedded systems
Basic knowledge of Deep Learning.
Experience coding/scripting in Perl/Python/bash
You will also be eligible for equity and .
These jobs might be a good fit