המקום בו המומחים והחברות הטובות ביותר נפגשים
What will you do:
Applies software engineering principles to the operations domain.
Contributes to a service's codebase, writes automation that aids in the management of a service, and performs operational engineering work to support a service's Service Level Objectives (SLO).
Ensures service reliability meets users’ needs, including internally critical and externally visible services
Uses software & systems engineering to design, build, and run large-scale, distributed, fault-tolerant systems
Focuses on iterative improvement through toil reduction and error-budget enforcement
Interfaces with both cloud IaaS and SaaS providers and internal stakeholders, including Support, IT, and Product Engineering, to achieve desired outcomes.
Participates in an on-call rotation within a geographically distributed team to provide 24x7x365 production support, with responsibility to respond to urgent customer issues
Practice sustainable incident response and blameless postmortems
Work within a small agile team to develop and improve SRE methodologies, support your peers, plan and self-improve
Provide feedback around bugs and feature improvements to the various Red Hat Product Engineering teams
What will you bring:
Bachelor's degree in computer science or a related technical field involving software or systems engineering, or practical experience demonstrating interest in SRE
2+ years of experience of using cloud providers and technologies (Google, Azure, Amazon, OpenStack, etc.)
1+ years of experience administering a kubernetes-based production environment
2+ years of experience programming with at least one object-oriented language; Golang, or Python are a big plus
Ability to collaboratively troubleshoot and solve problems in a team setting
Basic understanding of UNIX or Linux operating systems The following will be considered a plus:
Demonstrated comfort with collaboration, open communication, and reaching across functional boundaries
Passion for understanding users’ needs and delivering outstanding user experiences
Additional Skills:
Demonstrated ability to quickly and accurately troubleshoot system issues
Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
2+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider such as Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure
1+ years of experience with enterprise systems monitoring
2+ years of experience with enterprise configuration management software like Red Hat Ansible Automation Platform (AAP)
Experience with static code analysis tools
Some experience with code deployment across cloud-based environments
Some experience with continuous Integration and continuous deployment approaches
Some experience working with complex distributed systems
Demonstrated ability to debug, optimize code and automate routine tasks
Ability to work with minimal supervision and as part of a global team, and problem solving skills
Experience working with agile development methodologies
משרות נוספות שיכולות לעניין אותך