Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Palo Alto Staff Site Reliability Engineer Observability 
India, Karnataka, Bengaluru 
110877440

Yesterday

Being the cybersecurity partner of choice, protecting our digital way of life.

Your Impact

  • Implementing and supporting the Linux infrastructure as code where our globally distributed customer-facing platform runs.

  • Provision, configure & support resilient hybrid cloud deployment architecture using the automation framework and make it more efficient

  • Manage Linux infrastructure CI/CD platform, work with other SREs in deploying and maintaining automation framework, capacity planning, create and review PKI operational runbooks.

  • Manage scalability, capacity planning, redundancy, and resiliency.

  • Maintain service availability and performance SLAs based on business and product requirements.

  • Contribute to documentation related to design, deployment, validation, operations and DR/BCP.

  • Design proactive service monitoring, alerting and trend analysis of underlying infrastructure, and support the operations team in implementation.

  • Build and operate compute fabric for 1000s of VMs, Kubernetes Clusters. Develop scripts, build tools and write code to automate routine tasks.

  • Provide technical support to platform users

  • Respond to security implementation and audits of the environment.

  • Plan maintenance windows, write up change requests, present technical updates.

  • Participate in On-Call support including participating in RCA as required.

  • Design and implement network, compute and application-level monitoring solutions

  • Implement integrated and automated processes that drive operational excellence

  • Advise on industry best practices as it relates to new product selection

  • Drive operational cadences around business planning and performance management to ensure the efficient running of the IT org

Your Experience

  • Bachelors/Masters degree in Computer Science, Information Technology or technical stream with the equivalent combination of with Min of 5+ years work experience required.

  • Design, implement, and maintain comprehensive monitoring and observability solutions. This includes implementing and managing observability frameworks with a solid understanding of MELT (Metrics, Logs, Events, Traces)

  • Strong working experience and exposure to containers and orchestration ( Docker, Kubernetes)

  • Experience with administration and orchestration of cloud computing (AWS, GCP, etc.) running virtual or container environments.

  • Infrastructure as Code knowledge - Terraform, Ansible, Git, Puppet

  • Fluent Scripting skills preferably Python OR Shell OR Bash

  • Proficient in CI/CD platforms like Jenkins, CircleCI, etc

  • Background knowledge of network and security technologies

  • Experience in developing and managing APIs, understanding of API infrastructure optimization and security

  • Ability to work cross-functionally across multiple business units, such as product development and engineering

  • Must be able to collaborate with a global team spread across multiple time zones.

  • Passion, drive, energy, a sense of humour and a great attitude!

  • Knowledge of AIOps, Application of Machine Learning/Artificial Intelligence in Cloud Infrastructure, Observability or IT Operations.

Additional experience in one or more of the following areas is a big plus

  • Development of self-healing infrastructure and applications.

  • Understanding of Big data, data analytics theory and application.

  • Exposure to Enterprise Business Applications, ITSM frameworks and tools is a big plus.

All your information will be kept confidential according to EEO guidelines.

All your information will be kept confidential according to EEO guidelines.