Your Role and ResponsibilitiesWe are looking for enthusiastic & self-motivated Site Reliability Engineers (SRE) to join the IBM Consulting Advantage (ICA) Asset Engineering team. In this role, you will be responsible for ensuring the reliability and performance of SaaS environments/applications. Primary responsibilities include- Triage and troubleshoot issues in a timely and efficient manner
- Monitor system performance, and health using monitoring and log management tools
- Collaborate with cross-functional teams to identify and resolve root causes of issues
- Manage runbooks to ensure they are up-to-date and accurate, and follow instructions in runbooks to resolve issues
Required Technical and Professional Expertise
- Minimum 6 years of experience in SRE or DevOps
- Hands-on experience with Kubernetes and CI/CD tools (e.g. Tekton, ArgoCD)
- Strong understanding of cloud technologies, microservices architecture, and container orchestration tools
- Familiarity with monitoring and log management tools (e.g. Prometheus, Grafana) and PagerDuty or similar alerting tools
- Experience in implementing strategies for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
Preferred Technical and Professional Expertise
- Knowledge of IBM Cloud services and platforms
- Familiarity with scripting languages such as Python or Bash
- Understanding of security best practices and compliance requirements