Your Role and ResponsibilitiesAs a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.Required Technical and Professional Expertise
- System Monitoring and Troubleshooting: 2-5 years of experience in monitoring/observability, issue response, and troubleshooting for optimal system performance.
- Automation Proficiency: 2-5 years of experience in automation for production environment changes, streamlining processes for efficiency, and reducing toil.
- Linux: 2 to 5 years of experience working with Linux operating systems.
- Windows: 2 to 5 years of experience working with Windows operating systems.
- Operation and Support Experience: 2-5 years of experience of experience in handling day-to-day operations, alert management, incident support, migration tasks, and break-fix support.
- Fluent English required
Preferred Technical and Professional Expertise
- Kubernetes/OpenShift: knowledge or experience of Kubernetes/OpenShift environments.
- Automation/Scripting: knowledge or experience of Ansible, Python, Terraform, and CI/CD tools such as Jenkins, IBM Continuous Delivery, ArgoCD.
- Monitoring/Observability: knowledge or experience crafting alerts and dashboards using tools such as Instana, New Relic, Grafana/Prometheus.
- DBA: Interest or experience configuring and maintaining SQL, NoSQL, and data streaming technologies (e.g. DB2, PostgreSQL, CouchDB, Redis, Kafka, Spark, etc.).