Being the cybersecurity partner of choice, protecting our digital way of life.
Your Career
As a Senior DevOps Engineer on our Production Engineering team, you will be at the forefront of ensuring the stability, scalability, and performance of our production systems. You’ll be responsible for the health of large-scale cloud environments, investigating incidents, driving root cause analysis, and implementing long-term solutions that improve system reliability. You’ll also own and continuously improve the production release process, ensuring deployments are safe, automated, and well-orchestrated. You’ll collaborate closely with engineering, platform, and SRE teams to ensure world-class operational excellence for our customer-facing services.
Your Impact
Own the end-to-end release process: plan, coordinate, and execute deployments across environments with a strong focus on safety, reliability, and automation
Ensure stability and performance of all production systems, maintaining high availability through proactive monitoring and incident management
Investigate and resolve complex production issues, driving post-incident reviews and implementing long-term fixes
Respond to critical incidents and customer escalations with a calm, structured approach and clear communication
Define and uphold best practices for change management, observability, and system reliability
Manage infrastructure-as-code using Terraform for scalable cloud deployments
Improve monitoring, alerting, and recovery mechanisms to detect and resolve issues faster
Automate repetitive operational tasks through scripting and tooling
Collaborate with development teams to ensure smooth delivery and stable operation of new features
Participate in an on-call rotation to support production systems
Your Experience
5+ years of experience supporting large-scale production systems in cloud environments
Strong hands-on experience with Linux systems and networking fundamentals
Solid experience with cloud platforms (GCP preferred)
Hands-on experience running large-scale Kubernetes in production
Expertise with Terraform and infrastructure-as-code principles
Strong scripting skills (e.g., Python, Bash) to build automation and tooling
Proven experience owning or contributing to release and deployment processes
Familiarity with observability tools like Prometheus, Grafana, or similar
Ability to lead incident investigations and drive root cause resolution
Excellent communication and collaboration skills
All your information will be kept confidential according to EEO guidelines.
משרות נוספות שיכולות לעניין אותך