Finding the best job has never been easier
Share
our in-house custom-built network and server infrastructure responsible for
In addition to streaming video delivery, Open Connect Appliances (OCAs) are ideally situated to also improve the latency between clients and the Netflix services running on AWS. The Open Connect Edge Accelerator is taking advantage of the highly geo-distributed nature of Open Connect to improve the quality of experience. It is the entry point for device and website traffic, putting it on the critical path to delivering and monitoring our product experiences.
We are seeking a seasoned Reliability Engineer with extensive experience in *nix, networking, data analysis, and large-scale service operations experience to design, scale, operate, automate, and analyze our globally distributed CDN, with a focus on the Edge Accelerator services. You will be working on reliability, resilience, performance, latency measurement, steering solutions, low-latency reverse proxy, failover mechanisms, protocol optimizations, and DDoS protection to name a few.
Qualifications
Knowledge of and proven experience with CDNs and HTTP cache/proxy technologies
Deep understanding of Internet protocols like TCP, TLS, HTTP/S, and DNS
Experience building and maintaining highly distributed, scalable, low-latency, fault-tolerant production systems with a focus on security and reliability
Proficient in a programming language such as Go, C, or Python
Experience with distributed analytic processing technologies (Hive, Presto/Trino, Spark SQL, etc)
Great communication and documentation skills targeted at cross-team collaboration
Motivated by “the art of possible” and able to balance idealism and pragmatism
Cool-headed during production issues, able to focus on problem resolution
Preferred - BS in Computer Science, Electrical Engineering, or Computer Engineering (or equivalent professional experience)
Responsibilities
Drive continual improvement in resilience, security, observability, quality of experience (QoE), monitoring, instrumentation, and automation with the primary goal of maintaining highly scalable and reliable CDN services worldwide
Aggregate, analyze and correlate large amounts of server and application performance data. Use the innovative Netflix Big Data platform as a highly flexible, specialized, and efficient toolset for service delivery optimization and system reliability improvements
Participate in on-call rotation and handle escalations for service delivery production issues
Have lots of discussions about all the great content and your favorite movies and series
Job is open for no less than 7 days and will be removed when the position is filled.
These jobs might be a good fit