What you'll do...
What you'll do:
- Incident Response Leadership : Develop and implement best practices for leading incident response calls, ensuring efficient and effective handling of incidents to minimize downtime and impact.
- Root Cause Analysis : Perform in-depth RCAs for major incidents, identifying systemic issues and contributing factors to prevent future occurrences.
- Action Item Alignment : Ensure that RCA action items are properly aligned to address and solve the root causes, driving continuous improvement in our systems and processes.
- Cross-Functional Collaboration : Work closely with engineering, operations, and product teams to implement solutions that enhance system reliability and performance.
- Best Practice Development : Create and maintain documentation on incident response and RCA processes, sharing knowledge and training team members to elevate the overall SRE practice.
- Mentorship and Leadership : Mentor junior engineers and contribute to a culture of technical excellence and continuous learning within the SRE team.
What you'll bring:
- Extensive Experience : 10+ years of experience in software engineering or site reliability engineering, with a strong focus on incident management and root cause analysis.
- Technical Expertise : Deep knowledge of system design, networking, cloud infrastructure, and large-scale distributed systems.
- Leadership Skills : Proven ability to lead and manage high-stress incident response calls, making critical decisions under pressure.
- Analytical Mindset : Strong analytical skills to perform comprehensive RCAs and develop actionable solutions.
- Collaboration and Communication : Excellent communication skills, with the ability to convey complex technical concepts to diverse audiences and collaborate effectively across teams.
- Tool Proficiency : Experience with monitoring and incident management tools (e.g., Prometheus, Grafana, PagerDuty, Splunk).
- Innovative Thinking : A proactive approach to identifying improvements and implementing best practices in incident response and system reliability.
Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
The above information has been designed to indicate the general nature and level of work performed in the role. It is not designed to contain or be interpreted as a comprehensive inventory of all responsibilities and qualifications required of employees assigned to this job. The full Job Description can be made available as part of the hiring process.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.
For information about PTO, see
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see
The annual salary range for this position is $127,000.00-$219,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include: - Stock
Minimum Qualifications... Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Minimum Qualifications: Bachelor’s degree in Computer Science and 6 years’ experience in software engineering or related field OR 8 years’ experience in software engineering or related field.
Preferred Qualifications... Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
508 Sw 8Th St, Bentonville, AR 72712, United States of America