As a Senior Site Reliability Engineer (SRE), you will work closely with development, operations, and product teams to ensure the platform’s reliability, scalability, and performance. You’ll play a pivotal role in bridging gaps between software development and system operations, designing robust processes, and automating essential tasks to enhance the stability of the F5 Distributed Cloud product.
Key Responsibilities:
- Ensure availability, performance, and scalability of data plane services across global regions.
- Design and implement observability, metrics collection, dashboards, and alerting for proactive issue detection.
- Troubleshoot complex networking issues including packet loss, latency, TLS handshakes, routing, and DNS.
- Collaborate with platform, traffic, and product teams to improve system reliability and drive architectural improvements.
- Participate in on-call rotations and handle production incidents with a blameless postmortem culture.
- Contribute to incident response playbooks and continuous improvement of runbooks and monitoring.
- Drive capacity planning and performance testing for edge or core data plane services.
- Implement chaos testing, failover strategies, and resiliency best practices.
Requirements:
- 8+ years of experience and strong understanding of Linux systems, networking (TCP/IP, HTTP, DNS, TLS), and containers.
- Hands-on experience with Envoy, NGINX, or service mesh architectures (e.g., Istio, Linkerd).
- Familiarity with Kubernetes, especially with running sidecars, daemonsets, and network policies.
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Loki, ELK,).
- Proficiency in scripting or development (e.g., Python, Go, Bash).
- Good understanding of security principles in data plane such as mTLS, authentication, and traffic isolation.
Nice to Have:
- Experience operating multi-region traffic management systems or edge nodes.
- Prior experience with DDoS mitigation, rate limiting, WAFs, or API gateways.
- Understanding of SLAs/SLOs/SLIs and defining service reliability objectives.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.
The annual base pay for this position is: $166,625.00 - $249,937.00