Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Netflix Site Reliability Engineer L5 - Open Connect 
United States, Oregon 
288050036

20.03.2025
Work Type

our in-house custom-built network and server infrastructure responsible for

In addition to streaming video delivery, Open Connect Appliances (OCAs) are ideally situated to also improve the latency between clients and the Netflix services running on AWS. The Open Connect Edge Accelerator is taking advantage of the highly geo-distributed nature of Open Connect to improve the quality of experience. It is the entry point for device and website traffic, putting it on the critical path to delivering and monitoring our product experiences.

We are seeking a seasoned Reliability Engineer with extensive experience in *nix, networking, data analysis, and large-scale service operations experience to design, scale, operate, automate, and analyze our globally distributed CDN, with a focus on the Edge Accelerator services. You will be working on reliability, resilience, performance, latency measurement, steering solutions, low-latency reverse proxy, failover mechanisms, protocol optimizations, and DDoS protection to name a few.

Qualifications

  • Knowledge of and proven experience with CDNs and HTTP cache/proxy technologies

  • Deep understanding of Internet protocols like TCP, TLS, HTTP/S, and DNS

  • Experience building and maintaining highly distributed, scalable, low-latency, fault-tolerant production systems with a focus on security and reliability

  • Proficient in a programming language such as Go, C, or Python

  • Experience with distributed analytic processing technologies (Hive, Presto/Trino, Spark SQL, etc)

  • Great communication and documentation skills targeted at cross-team collaboration

  • Motivated by “the art of possible” and able to balance idealism and pragmatism

  • Cool-headed during production issues, able to focus on problem resolution

  • Preferred - BS in Computer Science, Electrical Engineering, or Computer Engineering (or equivalent professional experience)

Responsibilities

  • Drive continual improvement in resilience, security, observability, quality of experience (QoE), monitoring, instrumentation, and automation with the primary goal of maintaining highly scalable and reliable CDN services worldwide

  • Aggregate, analyze and correlate large amounts of server and application performance data. Use the innovative Netflix Big Data platform as a highly flexible, specialized, and efficient toolset for service delivery optimization and system reliability improvements

  • Participate in on-call rotation and handle escalations for service delivery production issues

  • Have lots of discussions about all the great content and your favorite movies and series

Job is open for no less than 7 days and will be removed when the position is filled.