Monitor and improve the availability, performance and security of production services
Apply prevention steps in order to improve production services reliability
Mitigate issues on production systems and build solutions through automation to prevent them from reoccurring
Enhance and feed the monitoring system to improve service reliability and to provide other teams at CyberArk with the dashboards to help deliver an excellent service to our customers
Automate common, repeatable tasks using Ansible and scripting languages
Triage and manage escalation of cases
Performance deliberate and structured Troubleshooting
Share the on-call rotation and act as an escalation contact for incidents
Influence design / architecture of services to proactively prevent system failures
CRITICAL SKILLS:
5-8 years of experience focused on site reliability, DevOps Engineering, system administration or application development
Strong hands-on experience in:
Linux/Unix and Windows OS
Network architecture and security configurations
Hands-on experience with the following scripting technologies:
Automation/Configuration management using either Ansible, Puppet, Chef or an equivalent
Python, Ruby, Bash, PowerShell
Hands-on experience with IAC (Infrastructure as code) like Terraform, CloudFormation
Hands-on experience with Cloud infrastructure such as AWS, Azure, GCP
Bachelor’s Degree in Computer Science or related field
Think like an attacker
Excellent communication skills
Strong attention to detail
Strong hands-on technical abilities
Strong computer literacy and/or the comfort, ability and desire to advance technically
Strong understanding of Information Security in various environments
Demonstrated ability to assume sole and independent responsibilities
Ability to keep track of numerous detail-intensive, interdependent tasks and ensure their accurate completion