Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Western Digital Professional Information Technology 
United States, South Carolina, Greenville 
166970383

10.07.2024
Company Description

Today’s exceptional challenges require your unique skills. It’s You & Western Digital. Together, we’re the next BIG thing in data.

Job Description

Western Digital’s High-Performance Computing environments are key to bringing new storage solutions to market. As a Senior High-Performance Computing (HPC) engineer in the IT Infrastructure team, you will be at the heart of Western Digital’s engineering and product development process, delivering the IT HPC infrastructure and services that empowers engineering teams to develop new storage technologies and deliver high quality products to market quickly.

What you’ll be doing:

  • Support multi-site, high-performance compute infrastructure and services for the global engineering product development organizations
  • Design, create, deliver, and support the deployment of Ansible automation within HPC and Unix environments
  • Identify and propose solutions and new services for the distributed ASIC and GPU computing clusters
  • Perform troubleshooting and root cause analysis of HPC clusters and file system related issues
  • Develop and maintain documentation for all aspects of the HPC infrastructure
  • Improve root cause analysis and corrective action for problems large and small – identify patterns and propose how we can automate repetitive tasks
  • Recommend and implement solutions to improve the performance of workloads
  • Support diverse Engineering Design Automation environment

Tooling

  • GitHub
  • CI/CD (Jenkins, Terraform, Ansible)
  • Splunk, Grafana, Prometheus

Infrastructure

  • Kubernetes/Open Shift
  • Cloud Computing (AWS Cloud, Google, Azure)
  • Cloud Storage Systems (S3, FSx, CVO)
  • OS: RedHat and any related distribution
  • Containers (Singularity/Docker)
Qualifications
  • Bachelor’s degree in computer science or equivalent experience
  • Linux systems administration experience specifically in managing or supporting RedHat and/or Centos Linux in production environments
  • Experience with configuration management tools: Ansible, Puppet, Chef
  • Experience with automation tools like Terraform or any other orchestration tools.
  • Ability to technically lead a project through the lifecycle
  • Scripting skills: highly skilled in at least two typical scripting languages (shell/bash, python, ruby)
  • Excellent problem-solving, multitasking, troubleshooting skills, and attention to detail are required to work in this challenging and dynamic environment
  • Very strong interpersonal, customer service, result-oriented, and team-building skills