Being the cybersecurity partner of choice, protecting our digital way of life.
Your Impact
- Provision, configure, and support resilient hybrid cloud deployment architectures using the automation framework
- Collaborate with development teams to ensure applications are production-ready, scalable, and reliable from the outset
- Manage CI/CD platform, Linux infrastructure, and collaborate with other SREs to deploy and maintain the automation framework, perform capacity planning, and create and review operational runbooks.
- Set up critical infrastructure and develop tools and frameworks to automate operational tasks, including the deployment of machines, services, and applications
- Participate in Incident Command on-call rotation supporting critical applications and services.
- Conducts root cause analysis of critical business and production issues and drives future preventive measures
- Manage scalability, capacity planning, redundancy, and resiliency
- Maintain service availability and performance SLAs based on business and product requirements.
- Contribute to documentation related to design, deployment, validation, and operations
- Design proactive service monitoring, alerting, and trend analysis of underlying infrastructure, and support the operations team in implementation
- Establish end-to-end monitoring and alerting on all critical components of the application
Your Experience
- 6+ Years of system engineering experience on mission-critical, enterprise-level systems
- 6+ years of experience using Infrastructure-As-Code to build large-scale environments, mainly on Linux platform (Ubuntu, SUSE, CentOS).
- 3+ years of experience working with cloud environments, primarily Google Cloud Platform
- Demonstrated Linux/Systems experience in a hybrid (cloud, on-prem) environment
- Strong experience with CI/CD pipeline, GitHub, Jenkins, Artifactory
- Must have a strong foundation in Linux operating systems, Troubleshooting, Design, and Implementation
- Expertise in configuration management with a framework such as Terraform, Ansible, and Helm.
- Experience using Infrastructure-As-Code to build large-scale environments
- Experience with Linux vulnerability management process and patching
- Must have programming knowledge in Python/Bash/Perl/Go languages to automate infrastructure workflow
- Understanding of software development methodologies and practices, including agile development, continuous integration, and continuous delivery
- Understanding of Network Firewalls, load balancers, and complex network designs
- Experience in monitoring technologies like Datadog, Nagios, Graphite, Cacti, and Grafana.
- Understanding Kubernetes, container lifecycle, and troubleshooting
- Hands-on knowledge of high-availability approaches such as load balancing, failover, clustering, and disaster recovery
- Excellent problem-solving, critical thinking, communication, and teamwork skills
- Passion, drive, energy, a sense of humor, and a great attitude
All your information will be kept confidential according to EEO guidelines.