Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Cisco XDR Sr Site Reliability Engineering 
United States, North Carolina, Cary 
448656900

09.07.2025

Your Impact

As a Senior Site Reliability Engineer, you will play a leading role in improving the efficiency, scalability, and reliability of the XDR Incident Generation team. Your work will focus on implementing sophisticated automation, encouraging a culture of operational perfection, and driving the adoption of Infrastructure as Code (IaC) and CI/CD best-practices. Additionally, you'll mentor team members, design robust platforms and services, and ensure their flawless lifecycle management.

Key Responsibilities

  • Manage the full lifecycle of platform services, from design and implementation to maintenance.
  • Promote and enforce Infrastructure-as-Code (IaC) practices to enable scalable, version-controlled, and auditable infrastructure.
  • Lead the automation of build, deploy, and release processes to boost team efficiency and innovation.
  • Design, develop, and maintain modern CI/CD - pipelines aligned with industry best-practices.

Minimum Qualifications

  • Extensive experience with AWS services (including VPC, S3, Lambda, SQS, Network Firewall, ECS/EKS, IAM, DynamoDB or CloudWatch) along with expertise in AWS security and/or cost optimization.
  • Proficiency in Infrastructure-as-Code tools such as Terraform, and scripting/programming languages including Python and/or Bash.
  • Experience in building and maintaining CI/CD pipelines using tools like GitHub Actions or TeamCity, combined with robust knowledge of incident management, postmortem analysis, and/or supervising SLOs/SLAs.
  • Ability to participate in on-call rotation.

Preferred Qualifications

  • Bachelors + 7 years, or Masters + 4 years of related experience.
  • Collaborate across teams, effectively communicating technical concepts with transparency and precision.
  • Mentor junior engineers, foster skill development, and uphold SRE best-practices within the team.
  • Expertise in crafting AI driven workflows for incident response, forecasting potential issues (e.g., resource exhaustion, outages), and enabling auto-scaling or remediation.
  • Proficient in integrating AI/ML tools for anomaly detection, threat response, and serverless architecture optimization on AWS.