Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

IBM Site Reliability Engineer Co-op Jan - months 
Canada, Ontario, Markham 
286549893

16.09.2024

Your Role and Responsibilities
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Your primary responsibilities include:
  • 24×7 Observability: Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience.
  • Cross-Functional Troubleshooting: Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
  • Deployment and Configuration: Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
  • Security and Compliance Implementation: Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.


Required Technical and Professional Expertise

  • System Monitoring and Troubleshooting: Strong skills in monitoring/observability, issue response, and troubleshooting for optimal system performance.
  • Automation Proficiency: Proficiency in automation for production environment changes, streamlining processes for efficiency, and reducing toil.
  • Linux Proficiency: Strong knowledge of Linux operating systems.
  • Operation and Support Experience: Demonstrated experience in handling day-to-day operations, alert management, incident support, migration tasks, and break-fix support.


Preferred Technical and Professional Expertise

  • Kubernetes/OpenShift: Strongly preferred experience in working with production Kubernetes/OpenShift environments.
  • Automation/Scripting: In depth experience with the Ansible, Python, Terraform, and CI/CD tools such as Jenkins, IBM Continuous Delivery, ArgoCD
  • Monitoring/Observability: Hands on experience crafting alerts and dashboards using tools such as Instana, New Relic, Grafana/Prometheus