Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

IBM Site Reliability Engineering Professional Compute SRE 
Costa Rica 
12416001

24.06.2024

Your Role and Responsibilities
As a Compute Operations Site Reliability Engineer, you will perform the following tasks:
  • Remotely administer Power Server hardware environments across numerous datacenter locations around the world (currently 18 datacenters and growing).
  • Develop automation to reduce manual toil (automated, repetitive tasks) using shell scripts (bash, etc), Python, Ansible, and related tools and languages.
  • Perform code stack updates on infrastructure systems (VIOS, firmware, PowerVC, HMC, Novalink, NIM servers) as well as cloud supporting systems (jump servers, sobox, network nodes, gateways, TSM servers).
  • Upload/maintain stock images.
  • Remotely administer AIX and Linux servers
  • Maintain UserIDs (Add/delete) and passwords.
  • Monitor daily/weekly backups to ensure they are working.
  • Manage and maintain Nagios monitoring environment, troubleshoot scripts/plug-ins if there is an issue.
  • Perform periodic LPMs, inactive migrations, or remote restarts of customer VMs to perform system maintenance, balance workloads, or free up resources.
  • Monitor and provide details of Capacity utilized in each Datacenter.
  • Attend scheduled meetings planned by customer for cutover/maintenance windows.
  • Verify capacity requirements in case of provisioning failure issues by customers.
  • Work with customers to resolve any RSCT issues so that LPM activities can be performed without impacting customer workloads.


Required Technical and Professional Expertise

  • In-depth knowledge of Power Server hardware.
  • Significant scripting/coding experience for automating all aspects of IBM Power systems administration.
  • Automation using Python, shell scripting (bash, etc), Ansible, and related tools and languages.
  • Experience with AIX and Linux administration, commands and networking – – role requires experience at the OS level.
  • Strong experience in one or more of the following: VIO, Novalink, and PowerVC. Familiarity with one more (to include installation, configuration, administration).
  • In-depth knowledge of PowerVM including installation/configuration and administration.
  • High level knowledge of Power Systems supported Operating Systems (AIX and IBMi).
  • In-depth knowledge of how storage is connected and allocated to Power systems via NPIV connections.
  • Good understanding of Power Systems network configuration at the system level.


Preferred Technical and Professional Expertise

  • Experience with configuring and tuning PowerVC
  • Experience training new personnel on tooling and processes
  • Storage & Power RTS, MVS Network for Cisco, Juniper; general support skills