About The RoleAs a Site Reliability Engineer, you will be focused on supporting a highly available and extremely secure production environment. You will analyze the reliability of our environments, use your coding skills to improve the way we operate, and debug sophisticated events as part of an on-call rotation.
What You Will Be Working On- Collaborate in the expansion and implementation of new servers and deployments across the globe.
- Work with teams to understand and implement client-specific workflows and processes.
- Working in a Ruby codebase to update and modernize patterns and systems.
- Debug a real-world, global web application, triaging issues while working with security or app teams to bring long-term solutions.
- Mentor peer engineers, write code, and do code reviews.
- Be part of a production on-call crew.
Minimum Qualifications- Strong skills in designing, deploying, and operating mid to large-scale enterprise or cloud environments, particularly on AWS or Bare Metal.
- Proficient in scripting or coding with languages such as Ruby, Go, Python, or Bash, with experience using Infrastructure as Code tools like Terraform and Ansible.
- Experience in supporting externally-facing production environments.
- In-depth knowledge of *nix system administration.
- Good knowledge of observability stacks, including Grafana and Prometheus.
Preferred Qualifications- Daringly jump into other people's source code to seek a problem.
- Care about the customer experience. You have experience supporting an externally-facing production environment, ideally in a team that follows the sun.
- Empathize with your coworkers, and you are a positive influence on others.