

Share
What you will do:
Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds
Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
Participate in the release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tooling, monitoring, and change management
Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats
Interact with automated monitoring and healing infrastructure to ensure healthy environments
Provide engineering support to Red Hat's global technical support team to resolve customer issues
Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
Participate in a global on-call rotation, including periodic weekend and holiday on-call duties
What you will bring:
3+ years of software engineering experience using object-oriented languages; Golang and Python are preference
Experience managing Linux-based systems in a public cloud like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure
Commercial experience with enterprise system monitoring; knowledge of Prometheus is a plus
Experience with container technology, Kubernetes, Openshift and configuration management tools ( Red Hat Ansible Automation, Puppet, or Chef) is a big plus
Demonstrated ability to quickly and accurately troubleshoot systems issues
Solid written and verbal communication skills in English
These jobs might be a good fit