Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Team8 SRE Team Leader 
Israel 
56736139

10.04.2025

The Site Reliability Engineering (SRE) team is based primarily in Israel and the US, and the 24/7 Network Operations Center (NOC) squad will be based in a location to be determined.

Site Reliability Engineering (SRE)

  1. Production Gatekeeper : Design and enforce the rollout strategy for new technologies and oversee their execution to ensure minimal disruption to existing systems.
  2. Production On-Call : Act as the first line of response for critical incidents, assessing issues, triaging, and coordinating with the team to prevent further issues and swiftly restore services.
  3. Monitor Production Performance and Degradation : Keep a close eye on system performance metrics and detect any degradation early to prevent outages and disruptions.
  4. Production Maintenance : Conduct regular infrastructure upgrades to accommodate changes, developments, and advancements in the technological landscape.
  5. Manage Release Flow : Oversee the release of updates and new functionalities, ensuring a seamless transition while handling any potential negative impacts on production.
  6. Staging Management : Oversee the management of the staging environment, ensuring that it accurately represents the production environment for effective testing and simulation.

Network Operations Center (NOC)

  1. Build Playbooks : Develop and maintain comprehensive playbooks for managing system issues and incidents, setting guidelines for troubleshooting, escalation, and resolution processes.
  2. Build Monitoring Dashboards : Design, set up, and maintain monitoring dashboards to visualize and track system performance and incidents in real-time.
  3. Alerts and Incident Management : Establish protocols for issuing alerts in the event of system issues or anomalies and lead the team in incident resolution.