Share
EXPECTATIONS AND TASKS
As a Site Reliability Engineer, you will have the opportunity to operate and support business critical Cloud services. As part of your daily job, you will proactively monitor the service behavior and identify areas for improvement. You will participate in the development of tools for monitoring and troubleshooting cloud services built on latest open source and SAP technologies, following SRE principles.
What you will do
· Act as technical expert during Live Site incidents (downtimes of supported services in scope), investigate and solve incidents on a deep technical level.
· Drive root cause analysis and follow-up improvements to prevent issues from reoccurring.
· Perform in-depth troubleshooting and log analysis to identify and solve complex issues in accordance with internal and external SLAs.
· Build software-based solutions to address improvements in service stability and reliability.
· Enhance infrastructure and platform monitoring by gathering system metrics (4 Golden Signals) and implementing tools for recovery.
· Integrate and collaborate closely with development teams and work with them on outputs from Postmortems and product improvements.
· Learn new technologies and keep up to date with latest development increments.
· Define, advocate, apply SRE best practices
· Participate in the on-call rotation (follow the sun approach) to react to major incidents. On-call has a special compensation package.
EDUCATION AND QUALIFICATIONS / SKILLS AND COMPETENCIES
· Bachelor's degree in computer science or engineering or equivalent combination of education and experience
· Good understanding of modern cloud architectures (experience with Cloud Platforms such as AWS, Azure, GCP are a plus)
· Enthusiasm for automation - make the computers do the work for you
· Excellent team player, passionate about his/her work, self-motivated and driven
· Excellent communication skills - precise, based on facts
· Fluency in English
Professional experience in at least one of the following areas and good knowledge of the rest
§ Scripting and automation
§ Experience with Unix/Linux operating system and good understanding of Linux internals
§ Database (PostgreSQL) Administration and support
· Security best practices for application development and operations in Cloud Environment
Experience with any of the following is considered an advantage
· Network architecture, e.g., TCP/IP, MAC addresses, IP packets, DNS, OSI layers and load balancing
· Experience with REST APIs is a plus
· Cloud and container technologies such as Cloud Foundry, Kubernetes, Docker
· Git, GitHub, Maven, Jenkins, Gradl
Successful candidates might be required to undergo a background verification with an external vendor.
AI Usage in the Recruitment Process
For information on the responsible use of AI in our recruitment process, please refer to our
Please note that any violation of these guidelines may result in disqualification from the hiring process.
These jobs might be a good fit