EP SRE - Backup & Recovery Storage Operations at IBM in Costa Rica, 389974945

Your role and responsibilities

Incident Management & Troubleshooting

Respond to alerts, incidents, and outages with a focus on minimizing downtime and restoring services efficiently.
Conduct thorough Root Cause Analysis (RCA) for critical issues and implement long-term solutions to prevent recurrence..

Monitoring & Observability

Design, implement, and manage monitoring solutions to gain insights into system health and performance.
Create and maintain intuitive dashboards that provide real-time visibility into critical metrics.
Set up proactive alerting mechanisms to detect and resolve issues before they impact end users.

Automation & Infrastructure as Code (IaC)

Develop robust automation scripts using tools such as Terraform , Ansible , or CloudFormation to simplify infrastructure management.
Automate repetitive operational tasks to improve system reliability and reduce manual effort.

Documentation & Runbooks

Develop comprehensive runbooks for effective incident response, troubleshooting, and system recovery.
Maintain detailed documentation for infrastructure, processes, and best practices to support team knowledge sharing.

Required education

Bachelor's Degree

Preferred education

Bachelor's Degree

Required technical and professional expertise

Required Skills and Experience

· Strong understanding ofandoperations.

· Hands-on experience inorpractices.

· Proficiency inKubernetesfor container orchestration and cluster management.

· Proficiency inenvironments with expertise in shell scripting and understanding ofLinux internals.

· Experience in writing code usingorGofor automation and tooling.

· Practical knowledge ofandTerraformfor infrastructure automation.

· Solid experience withandfeature branching strategies.

· Expertise inandto ensure system observability and performance.

Required Education

· Bachelor’s degree in computer science engineering/information technology

Required Experience

· 1-2+ years

IBM EP SRE - Backup & Recovery Storage Operations
Costa Rica
389974945