Fortinet, founded over 20 years ago, has become a driving force in the evolution of cybersecurity and the convergence of networking and security. Our mission is to secure people, devices, and data everywhere.
What You Will Do:
- Reliability & Performance Optimization
- Ensure high availability of services by implementing best practices in monitoring, alerting, and incident management.
- Conduct capacity planning, performance tuning, and load testing to optimize system efficiency.
- Implement self-healing mechanisms to minimize downtime and improve fault tolerance.
- 2. Infrastructure & Automation
- Design, implement, and maintain scalable Kubernetes clusters across multiple environments.
- Automate infrastructure provisioning using Terraform, Helm, or Ansible.
- Manage CI/CD pipelines to streamline deployments and reduce manual interventions.
- 3. Monitoring & Incident Response
- Develop and maintain observability solutions using Prometheus, Grafana, ELK, or OpenTelemetry.
- Set up automated alerting and on-call rotations to ensure proactive issue resolution.
- Perform root cause analysis (RCA) and post-mortems to drive continuous improvements.
- 4. Security & Compliance
- Enforce security best practices, including IAM policies, network security, and container security.
- Ensure compliance with industry standards (SOC 2, ISO 27001, etc.) through automated security checks.
- Collaboration & Documentation
- Work closely with development, security, and operations teams to align reliability goals with business needs.
- Maintain clear and up-to-date runbooks, incident reports, and system documentation.
We Are Looking for:
- 3+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.
- Strong knowledge of Kubernetes, Docker, and container orchestration.
- Proficiency in cloud platforms (AWS, GCP, or Azure) and infrastructure as code tools like Terraform.
- Experience with observability tools (Prometheus, Grafana, Datadog, New Relic, etc.).
- Expertise in CI/CD pipelines (Jenkins, GitHub Actions, ArgoCD, Flux, etc.).
- Proficiency in scripting & automation (Python, Go, Bash).
- Understanding of networking, load balancing, DNS, and security best practices.
- Excellent troubleshooting skills with a focus on incident resolution and post-mortem analysis.
Preferred Qualifications
- Experience with multi-cluster Kubernetes management and service mesh (Istio, Linkerd).
- Familiarity with GitOps workflows and policy-driven infrastructure.
- Knowledge of machine learning-driven anomaly detection for proactive monitoring.
Working Conditions:
This position requires working from the office full-time; remote work is not available.