מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Palo Alto Staff Site Reliability Engineer Observability
India, Karnataka, Bengaluru
110877440

08.10.2025

שיתוף

התחבר/י כדי להגיש מועמדות

Being the cybersecurity partner of choice, protecting our digital way of life.

Your Impact

Implementing and supporting the Linux infrastructure as code where our globally distributed customer-facing platform runs.
Provision, configure & support resilient hybrid cloud deployment architecture using the automation framework and make it more efficient
Manage Linux infrastructure CI/CD platform, work with other SREs in deploying and maintaining automation framework, capacity planning, create and review PKI operational runbooks.
Manage scalability, capacity planning, redundancy, and resiliency.
Maintain service availability and performance SLAs based on business and product requirements.
Contribute to documentation related to design, deployment, validation, operations and DR/BCP.
Design proactive service monitoring, alerting and trend analysis of underlying infrastructure, and support the operations team in implementation.
Build and operate compute fabric for 1000s of VMs, Kubernetes Clusters. Develop scripts, build tools and write code to automate routine tasks.
Provide technical support to platform users
Respond to security implementation and audits of the environment.
Plan maintenance windows, write up change requests, present technical updates.
Participate in On-Call support including participating in RCA as required.
Design and implement network, compute and application-level monitoring solutions
Implement integrated and automated processes that drive operational excellence
Advise on industry best practices as it relates to new product selection
Drive operational cadences around business planning and performance management to ensure the efficient running of the IT org

Your Experience

Bachelors/Masters degree in Computer Science, Information Technology or technical stream with the equivalent combination of with Min of 5+ years work experience required.
Design, implement, and maintain comprehensive monitoring and observability solutions. This includes implementing and managing observability frameworks with a solid understanding of MELT (Metrics, Logs, Events, Traces)
Strong working experience and exposure to containers and orchestration ( Docker, Kubernetes)
Experience with administration and orchestration of cloud computing (AWS, GCP, etc.) running virtual or container environments.
Infrastructure as Code knowledge - Terraform, Ansible, Git, Puppet
Fluent Scripting skills preferably Python OR Shell OR Bash
Proficient in CI/CD platforms like Jenkins, CircleCI, etc
Background knowledge of network and security technologies
Experience in developing and managing APIs, understanding of API infrastructure optimization and security
Ability to work cross-functionally across multiple business units, such as product development and engineering
Must be able to collaborate with a global team spread across multiple time zones.
Passion, drive, energy, a sense of humour and a great attitude!
Knowledge of AIOps, Application of Machine Learning/Artificial Intelligence in Cloud Infrastructure, Observability or IT Operations.

Additional experience in one or more of the following areas is a big plus

Development of self-healing infrastructure and applications.
Understanding of Big data, data analytics theory and application.
Exposure to Enterprise Business Applications, ITSM frameworks and tools is a big plus.

All your information will be kept confidential according to EEO guidelines.

פרטי המשרה המלאים

משרות נוספות שיכולות לעניין אותך

GEH