Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Cisco XDR Sr Site Reliability Engineering 
United States, North Carolina, Cary 
377982494

28.07.2025


Your Impact
As a Senior Site Reliability Engineer, you will play a leading role in enhancing the efficiency, scalability, and reliability of the XDR Incident Generation team. Your work will focus on implementing advanced automation, fostering a culture of operational excellence, and driving the adoption of Infrastructure as Code (IaC) and CI/CD best practices. Additionally, you'll mentor team members, design robust platforms and services, and ensure their seamless lifecycle management.


Key Responsibilities

  • Manage the full lifecycle of platforms and services, from design and implementation to maintenance.
  • Promote and enforce Infrastructure-as-Code (IaC) practices to enable scalable, version-controlled, and auditable infrastructure.
  • Lead the automation of build, deploy, and release processes to boost team productivity and innovation.
  • Design, develop, and maintain modern CI/CD pipelines aligned with industry best practices.
  • Ability to participate in on-call rotation

Minimum Qualifications

  • Extensive experience with AWS services (including VPC, S3, Lambda, SQS, Network Firewall, EKS, IAM, DynamoDB or CloudWatch) along with expertise in AWS security and cost optimization.
  • Proficiency in Infrastructure-as-Code tools such as Terraform, and/or scripting/programming languages including Python and/or Bash.
  • Experience in building and maintaining CI/CD pipelines using tools like GitHub Actions or ArgoCD or TeamCity, combined with robust knowledge of incident management, postmortem analysis, and/or tracking SLOs/SLAs.

Preferred Qualifications

  • Bachelors + 7 years of related experience, or Masters + 4 years of related experience.
  • Collaborate across teams, effectively communicating technical concepts with clarity and precision.
  • Mentor junior engineers, foster skill development, and champion SRE best practices within the team.
  • Expertise in designing AI-driven workflows for incident response, forecasting potential issues (e.g., resource exhaustion, outages), and enabling auto-scaling or remediation.
  • Proficient in integrating AI/ML tools for anomaly detection, threat response, and serverless architecture optimization on AWS.