Being the cybersecurity partner of choice, protecting our digital way of life.
Your Impact
- Provision, configure, and support resilient hybrid cloud deployment architectures using the automation framework
 - Collaborate with development teams to ensure applications are production-ready, scalable, and reliable from the outset
 - Manage CI/CD platform, Linux infrastructure, and collaborate with other SREs to deploy and maintain the automation framework, perform capacity planning, and create and review operational runbooks.
 - Set up critical infrastructure and develop tools and frameworks to automate operational tasks, including the deployment of machines, services, and applications
 - Participate in Incident Command on-call rotation supporting critical applications and services.
 - Conducts root cause analysis of critical business and production issues and drives future preventive measures
 - Manage scalability, capacity planning, redundancy, and resiliency
 - Maintain service availability and performance SLAs based on business and product requirements.
 - Contribute to documentation related to design, deployment, validation, and operations
 - Design proactive service monitoring, alerting, and trend analysis of underlying infrastructure, and support the operations team in implementation
 - Establish end-to-end monitoring and alerting on all critical components of the application
 
Your Experience
- 6+ Years of system engineering experience on mission-critical, enterprise-level systems
 - 6+ years of experience using Infrastructure-As-Code to build large-scale environments, mainly on Linux platform (Ubuntu, SUSE, CentOS).
 - 3+ years of experience working with cloud environments, primarily Google Cloud Platform
 - Demonstrated Linux/Systems experience in a hybrid (cloud, on-prem) environment
 - Strong experience with CI/CD pipeline, GitHub, Jenkins, Artifactory
 - Must have a strong foundation in Linux operating systems, Troubleshooting, Design, and Implementation
 - Expertise in configuration management with a framework such as Terraform, Ansible, and Helm.
 - Experience using Infrastructure-As-Code to build large-scale environments
 - Experience with Linux vulnerability management process and patching
 - Must have programming knowledge in Python/Bash/Perl/Go languages to automate infrastructure workflow
 - Understanding of software development methodologies and practices, including agile development, continuous integration, and continuous delivery
 - Understanding of Network Firewalls, load balancers, and complex network designs
 - Experience in monitoring technologies like Datadog, Nagios, Graphite, Cacti, and Grafana.
 - Understanding Kubernetes, container lifecycle, and troubleshooting
 - Hands-on knowledge of high-availability approaches such as load balancing, failover, clustering, and disaster recovery
 - Excellent problem-solving, critical thinking, communication, and teamwork skills
 - Passion, drive, energy, a sense of humor, and a great attitude
 
All your information will be kept confidential according to EEO guidelines.