Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Palo Alto Principal Site Reliability Engineer Prisma AIRS 
United States, California 
180954940

Today

Being the cybersecurity partner of choice, protecting our digital way of life.

Your Career

A Principal Site Reliability Engineer in Prisma AIRS embodies integrity, creativity, and a tireless dedication to continuous improvement. You will have the opportunity to design, build and operate cutting edge cloud-native applications from the ground up at massive scale. We're looking for resourceful and discerning engineers with a diverse technology background and a bias for action who will accelerate the team with creativity, experience and clean code.

Your Impact

  • Operate Prisma AIRS Cloud Services through contemporary Reliability Engineering practices.

  • Design, Build, Operate and Secure Cloud-Native Microservice Applications at Global Scale.

  • Own End-to-End Service Delivery in Production - Availability, Performance, Scalability, Security.

  • Partner with Software & ML Engineers to design and build new capabilities and features.

  • Banish toil through automation - from shell scripting to cluster orchestration to dynamic CI pipelines.

  • Gain a deep understanding of how we deliver AI Security; you'll be able troubleshoot end-to-end a production issue from an inbound HTTP request, through the network, webserver, model inferencing, database, down to the hardware layer.

Your Experience

  • You must be an expert in all things Kubernetes; you have a deep understanding of Kubernetes concepts, experience with building and operating production applications in multi-cluster environments, writing Helm charts from scratch and interacting with the Kubernetes API.

  • You must be an expert in either GCP or AWS, with at least 5 years of experience building and operating production cloud infrastructure at scale.

  • You must have significant Software Engineering / Development experience building applications in Go and/or Python.

  • You should have demonstrated experience in network operations, such as cloud networking, network security, and/or distributed computing systems.

  • You should have demonstrated experience in Linux administration, particularly in the context of cloud-native distributed systems, container runtimes, or Linux server fleets.

  • You should have experience with Relational Databases and SQL; you know how to read, write and refactor SQL queries, identify opportunities for and design secondary indexes, manage database objects such as tables, views, stored procedures, and perform backup/restore operations.

  • You should have experience designing, building and maintaining CI and/or GitOps pipelines for complex multi-application/multi-environment projects.

  • You should have experience in building application observability through Prometheus / OpenTelemetry metrics, Structured Logging or Distributed Tracing systems.

  • You may have practical experience in Information Security, such as Cloud / Application / Network Security and are familiar with compliance programs such as SOC2, ISO/IEC 27001, PCI-DSS, FedRAMP or control frameworks such as MITRE ATT&CK, NIST 800-53, OWASP or others.

  • You may have experience with running LLM / Machine Learning Inferencing Servers at scale across heterogeneous multi-GPU cloud environments.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $151,600 - $245,294/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found .

All your information will be kept confidential according to EEO guidelines.