Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Red hat Senior Site Reliability Engineer - AI Application Platform/OpenShift 
Czechia, Southeast, Brno 
595641255

17.04.2025

Job Description

What you will do:

  • Build and manage our large scale infrastructure and platform services, including public cloud, private cloud, and datacenter-based

  • Automate cloud infrastructure through use of technologies (e.g. auto scaling, load balancing, etc.), scripting (bash, python and golang), monitoring and alerting solutions (e.g. Splunk, Splunk IM, Prometheus, Grafana, Catchpoint etc)

  • Participate in the design and development of software like Kubernetes operators, webhooks, cli-tools…

  • Implement and maintain intelligent infrastructure and application monitoring designed to enable other engineering teams

  • Ensure the production environment is operating in accordance with established procedures and best practices

  • Provide escalation support for high severity and critical platform-impacting events

  • Provide feedback around bugs and feature improvements to the various Red Hat Product Engineering teams

  • Contribute software tests and participate in peer review to increase the quality of our codebase

  • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration

  • Participate in a regular on-call schedule, supporting the operation needs of our tenants

  • Practice sustainable incident response and blameless postmortems

  • Work within a small agile team to develop and improve SRE methodologies, support your peers, plan and self-improve

What you will bring:

  • 5+ years of experience of using cloud providers and technologies (Google, Azure, Amazon, OpenStack etc)

  • 3+ years of experience administering a kubernetes based production environment

  • 3+ years of experience with enterprise systems monitoring

  • 3+ years of experience with enterprise configuration management software like Ansible by Red Hat, Puppet, or Chef

  • 2+ years of experience programming with at least one object-oriented language; Golang, Java, or Python are preferred

  • 2+ years of experience delivering a hosted service

  • Demonstrated ability to quickly and accurately troubleshoot system issues

  • Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP

  • Demonstrated comfort with collaboration, open communication and reaching across functional boundaries.

  • Passion for understanding users’ needs and delivering outstanding user experiences.

  • Independent problem-solving and self-direction.

  • Works well alone and as part of a global team.

  • Experiencing working with Agile development methodologies.