Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Palo Alto Principal AI Engineer Enterprise Platform 
United States, California 
241501803

23.06.2025

Being the cybersecurity partner of choice, protecting our digital way of life.

Your Career

As a Principal AIOps Engineer for the Enterprise AI Platform, you will be a pivotal technical leader responsible for designing, developing, and implementing AI-driven solutions to enhance the reliability, performance, and efficiency of our critical IT and business systems. You will leverage the core AI platform to build sophisticated AIOps capabilities, transforming how we monitor, manage, and optimize our digital infrastructure and applications. This role requires a deep understanding of IT operations, machine learning, and scalable system design to proactively identify issues, automate remediation, and drive continuous improvement across the enterprise.

Your Impact

  • AIOps Platform Development: Design, develop, and implement advanced AIOps solutions, leveraging machine learning algorithms and data analytics to automate and enhance IT operations. This includes developing real-time processing solutions for observational data (e.g., logs, metrics, events, traces).
  • Anomaly Detection & Predictive Analytics: Lead the implementation of AI/ML models for proactive anomaly detection, root cause analysis, and predictive insights into system health and performance across applications and infrastructure at enterprise scale.
  • Intelligent Automation & Orchestration: Drive the automation of routine operational tasks, incident response, and remediation workflows using AI-driven agents and orchestration tools, minimizing manual intervention and improving operational efficiency.
  • Observability & Data Integration: Collaborate with observability teams to ensure the efficient collection, processing, and transformation of high-volume, cross-domain data from diverse sources (events, logs, metrics, tickets, monitoring tools) into actionable intelligence for the AIOps platform.
  • Incident Management & Remediation: Integrate AIOps insights with existing incident management systems, providing real-time intelligence to rapidly identify, diagnose, and resolve IT issues, leading to proactive issue resolution and reduced mean time to recovery (MTTR).
  • Performance Optimization: Utilize AI insights to continuously monitor, analyze, and fine-tune IT systems for peak operational efficiency, capacity planning, and resource optimization.
  • Technical Leadership & Mentorship: Provide technical leadership and mentorship to other engineers, promoting architectural excellence, innovation, and best practices in AIOps development and operations.
  • Cross-Functional Collaboration: Partner with data scientists, ML engineers, software engineers, SREs, and IT operations teams to integrate AI/ML agents into the platform and ensure AIOps solutions align with business needs and deliver measurable ROI.
  • Innovation & Research: Actively research and evaluate emerging AIOps technologies, generative AI, LLM models, ChatOps AI, and advanced RAGs, bringing promising innovations into production through POCs and long-term architectural evolution.

Your Experience

  • 10+ years of experience in software engineering, reliability engineering, or IT operations, including at least 5 years leading the design and implementation of AIOps solutions at scale.
  • Proven expertise in applying machine learning algorithms and data analysis techniques to solve complex IT operational challenges.
  • Strong hands-on experience in building and maintaining scalable data pipelines and workflows for efficient data collection, processing, and analysis from diverse IT sources.
  • Proficiency in programming languages such as Python, Go, Java, or Scala.
  • Extensive experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes).
  • Familiarity with data processing frameworks (e.g., Apache Kafka, Apache Spark) and IT monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk).
  • Deep understanding of distributed systems architecture, microservices, and their operational challenges.
  • Demonstrated ability to translate business requirements and operational pain points into technical specifications and deliver robust AIOps solutions.
  • Excellent problem-solving skills and the ability to troubleshoot complex platform-related issues.
  • Strong communication and interpersonal skills, with a track record of influencing technical and cross-functional stakeholders.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.

Preferred Qualifications

  • Master's degree or Ph.D. in Computer Science, Machine Learning, or a related technical field.
  • Experience with agentic systems and AI agents for automation.
  • Experience with DevOps practices and CI/CD pipelines in an AIOps context.
  • Prior experience in cybersecurity operations or building AIOps solutions for security threat detection and response.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected /YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found .

All your information will be kept confidential according to EEO guidelines.