Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

F5 Sr Site Reliability Engineer 
United States, California, San Jose 
854370060

Yesterday


As a Senior Site Reliability Engineer (SRE), you will work closely with development, operations, and product teams to ensure the platform’s reliability, scalability, and performance. You’ll play a pivotal role in bridging gaps between software development and system operations, designing robust processes, and automating essential tasks to enhance the stability of the F5 Distributed Cloud product.

Key Responsibilities:

  • Ensure availability, performance, and scalability of data plane services across global regions.
  • Design and implement observability, metrics collection, dashboards, and alerting for proactive issue detection.
  • Troubleshoot complex networking issues including packet loss, latency, TLS handshakes, routing, and DNS.
  • Collaborate with platform, traffic, and product teams to improve system reliability and drive architectural improvements.
  • Participate in on-call rotations and handle production incidents with a blameless postmortem culture.
  • Contribute to incident response playbooks and continuous improvement of runbooks and monitoring.
  • Drive capacity planning and performance testing for edge or core data plane services.
  • Implement chaos testing, failover strategies, and resiliency best practices.

Requirements:

  • 8+ years of experience and strong understanding of Linux systems, networking (TCP/IP, HTTP, DNS, TLS), and containers.
  • Hands-on experience with Envoy, NGINX, or service mesh architectures (e.g., Istio, Linkerd).
  • Familiarity with Kubernetes, especially with running sidecars, daemonsets, and network policies.
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Loki, ELK,).
  • Proficiency in scripting or development (e.g., Python, Go, Bash).
  • Good understanding of security principles in data plane such as mTLS, authentication, and traffic isolation.

Nice to Have:

  • Experience operating multi-region traffic management systems or edge nodes.
  • Prior experience with DDoS mitigation, rate limiting, WAFs, or API gateways.
  • Understanding of SLAs/SLOs/SLIs and defining service reliability objectives.

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

The annual base pay for this position is: $166,625.00 - $249,937.00