About the Role
What You’ll Do
1 . Support Production Readiness: Participate in incident reviews, postmortem processes, and quality audits. Help enforce production standards across services.
2 . Contribute to Metrics & Automation: Build or improve tools and dashboards (e.g., Tableau, SQL-based) to monitor reliability, measure SLA adherence, and identify improvement opportunities.
4 . Improve Engineering Workflows: Help maintain runbooks, lockdown policies, readiness reviews (PRR), and alerting hygiene to streamline operational practices.
5 . Learn from Experience: Shadow experienced engineers, attend architecture reviews, and gain exposure to high-severity incident handling and quality programs.
Basic Qualifications
1 . Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
2 . 3+ years of experience in software development, operations, incident management, or production engineering.
3 . Comfortable writing basic SQL and/or Python scripts to extract and analyze production data.
4 . Strong communication and documentation skills, with attention to detail.
5 . Interest in systems reliability, incident management, and continuous improvement.
Preferred Qualifications
1 . Experience working with incident tracking systems (e.g., Jira, PagerDuty) or monitoring platforms (e.g., Tableau, Grafana, Prometheus).
2 . Exposure to production support, quality assurance, or DevOps environments.
3 . Familiarity with topics such as SLAs, SLOs, alerting, or postmortem workflows.
What You'll Gain
1 . Mentorship from senior production engineers and reliability experts.
2 . Deep exposure to production systems, incident operations, and quality tooling at scale.
1 . Build tools and dashboards to monitor and improve production quality.
2 . Lead initiatives to standardize quality and reliability policies and processes.
* Accommodations may be available based on religious and/or medical conditions, or as required by applicable law. To request an accommodation, please reach out to .
משרות נוספות שיכולות לעניין אותך