Being the cybersecurity partner of choice, protecting our digital way of life.
Your Career
Palo Alto Networks has been rapidly moving towards the future where cloud-based applications are increasingly common. As a Site Reliability Engineer, you will develop the frameworks and pathways to help move our internal applications to microservices. You will be a critical link between engineering and the Infrastructure Platform, building Infrastructure as Code and working in partnership with the App developers to deploy the applications in GCP, AWS and data centers across the globe.
Your Impact
- Write automation code for provisioning and operating infrastructure at massive scale
- Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations
- Work with development teams to make sure the applications are production ready, scalable and reliable from the grounds up
- Identify and drive opportunities to improve automation for code deployment, management, and visibility of application services
- Develop tools and framework to automate operational tasks, deployment of machines, services, applications
- Establish end-to-end monitoring and alerting on all critical components of the application
- Participate in the on-call rotation supporting the platform and or the production application
- Directs root cause analysis of critical business and production issues
- Develop and mentor other SREs on standard methodology from Infra orchestration and troubleshooting application service in production
- Represent SRE in design reviews and work cross-functionally with Engineering teams on operational readiness
Your Experience
- BS/MS in Computer Science or Computer Engineering or equivalent military experience required
- Expertise in configuration management with a framework such as Terraform, Ansible, and Helm
- Strong Linux administration, internals, and network troubleshooting
- Experience in DevOps, Site Reliability, or infrastructure engineering
- Expertise in Google cloud computing (GCP) and its related services
- Proficiency with a programming language like Python and shell scripting to automate tasks
- Strong experience with CI/CD pipeline, GitHub, Jenkins, Artifactory
- Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
- Strong fundamentals in HTTP including HTTP headers and web servers
- Excellent problem solving, critical thinking, communication, and teamwork skills
- Excellent written and verbal communication, able to collaborate and rally support
- Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
- Passion for automation and monitoring instrumentation as code
- Excellent interpersonal skills and the ability to work well in a team
- Passionate to learn, understand, and dissect new technology stack quickly on own
- Have experience on building and managing large relational database cluster (MySQL/Percona etc.) will be a plus
We define the industry, instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.
Compensation Disclosure
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $126000 - $203500/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found .
All your information will be kept confidential according to EEO guidelines.