Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

IBM Site Reliability Engineer II 
United States, Washington, Bellevue 
953780414

Today
In this role, you will be part of a team that develops and supports the Apptio Kubernetes Platform (AKP)
where all Apptio applications are deployed. In a typical day you will interact with Github, Linux,
Kubernetes, ArgoCD, Docker, Confluence, Jira, Slack, and AWS.

You Are

You Aren’t

A Kubernetes or cloud expert with many years of experience. This is an intermediate position; we want
you to help us, and we also want to help you grow.

Your Role and Responsibilities
• Manage deployments of Apptio services to AKP
• Streamline the deployment process
• Improve observability of the services within your purview by reviewing KPI dashboards and• Author and maintain documentation of deployment and monitoring processes
• Use runbooks to troubleshoot and triage production issues
• Detect issues and handle Tier 1-2 troubleshooting
• Participate in online “swarm” collaboration sessions
• Collaborate with service developers
• Participate in on-call rotation
• Perform maintenance of the platform (patching, resets, upgrades, etc.)


Required Technical and Professional Expertise
• 1+ years’ experience in an SRE or adjacent role
• Foundational understanding of at least one programming language and source control
(Preferably Golang)
• Practical experience with distributed application deployment and management
• Practical experience with container technologies (e.g., Kubernetes, Docker)
• Practical experience with Infrastructure-as-code (IaC) – Terraform, Cloud Formation, Ansible• Experience with cloud provider services such as AWS, Azure, or Google Cloud Platform
• Familiarity with RESTful systems and their APIs
• Demonstrated fluency with the English language

Preferred Technical and Professional Expertise
• 2+ years’ experience in an SRE or adjacent role
• Familiarity with Apptio and IBM product offerings