Expoint - all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Red hat Senior Site Reliability Engineer 
United States, Massachusetts, Boston 
726187570

Today

About the Job

The Sr. Site Reliability Engineer applies a deep understanding of software and systems engineering principles to design and implement solutions that enhance service reliability.

This position requires good judgment and the ability to prioritize work effectively while contributing to the overall goals of the SRE team and organization.

What You Will Do

  • Lead the development and implementation of robust code and automation scripts to improve service reliability and scalability

  • Conduct thorough code reviews and testing processes to ensure the highest quality standards in the codebase

  • Work to solve moderately complex issues, making decisions that impact the service's reliability and performance

  • Mentor and guide junior engineers, fostering a collaborative environment focused on continuous improvement

  • Engage in a regular on-call rotation, taking responsibility for critical incidents and ensuring timely resolution

  • Lead incident response and postmortem processes, implementing solutions to prevent recurrence of issues

  • Collaborate with cross-functional teams to design, develop, and refine SRE tools and systems that support service objectives

  • Take ownership of tasks and projects, prioritizing them according to their impact on service health and team goals

What You Will Bring

  • Linux Systems Management: Extensive experience managing Linux servers, particularly Red Hat Enterprise Linux (RHEL), CentOS, or Fedora, within cloud environments such as AWS, GCP, or Azure; Includes advanced system administration, networking, and troubleshooting

  • Automation and Scripting: Proficient in writing and maintaining scripts for automation and orchestration tasks using tools like Ansible, Terraform, or custom scripts, to enhance efficiency and reduce manual workload

  • Monitoring and Observability: Expertise in setting up and managing enterprise monitoring and observability solutions (e.g., Prometheus, Grafana), enabling proactive detection and resolution of issues

  • Configuration Management: In-depth experience with configuration management tools such as Puppet, Chef, or similar, ensuring consistent and reproducible system states across environments

  • Incident Management: Proven ability to lead incident response efforts, from initial troubleshooting to root cause analysis and implementing preventative measures

  • Service Delivery and Optimization: Understanding of service delivery processes, with a focus on optimizing performance, reliability, and availability of hosted services

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave