Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Cyberark Associate Site Reliability Engineer 
India 
946096870

Yesterday

Responsibilities:

  • Incident Management, Monitoring and Alerting : Drive incident response processes and troubleshoot complex issues, ensuring timely resolution of outages. Establish monitoring, logging, and alerting best practices using tools like Datadog, Site24x7 etc
  • Tooling and Automation : Build essential tooling to improve reliability of systems and automated remediation of issues.
  • Be a part of the on-call rotation 365x24x7.
  • SOP Documentation: Create and maintain documentation for infrastructure, processes, and incident management protocols.
  • Understanding of Infrastructure as Code (IaC) tools such asTerraformandAnsibleto automate the provisioning, configuration, and deployment processes.
  • Attend all training programs and complete all tasks set by the supervisor and assist other trainees wherever possible.
  • Cloud Platform Expertise: Hands-on with AWS cloud services, including EC2, S3, VPC, RDS, EKS, ECS, CF and more.
  • CI/CD Pipelines: Fair understanding of CI/CD pipelines using tools like Jenkins.
  • Monitoring and Alerting: Hands-on experience with monitoring and alerting tools like ELK, Datadog, CloudWatch, Grafana etc to proactively identify and resolve issues.
  • Performance Tuning : Continuously optimize system performance, identify bottlenecks, and implement strategies to improve scalability and efficiency.
  • Cost Optimization: Identify and implement strategies to reduce cloud costs while maintaining performance and reliability.
  • Security Best Practices: Adhere to security best practices and implement measures to protect infrastructure and data from vulnerabilities and threats.
  • Collaboration and Communication: Work effectively with cross-functional teams to understand business requirements and provide technical guidance.

Required Skills and Experience:

  • 2-3 years of experience as a Site Reliability
  • Strong proficiency in AWS cloud services like EC2, S3, VPC, RDS, EKS, ECS, CloudFormation and more. AWS Certification helps.
  • Good Logical, Analytical and Problem-solving skills.
  • Strong communication skills and Ability to work in shifts (24x7).
  • Strong scripting skills (Python, PowerShell, CDK, Shell scripting).
  • Understanding of infrastructure as code tools (Terraform, Ansible) and AWX Tower for Ansible automation.
  • Knowledge of containerization (Docker) and orchestration platforms (Kubernetes).
  • Expertise in CI/CD pipelines and automation tools (Jenkins, GitHub).
  • Exposure to monitoring and alerting tools (CloudWatch, Datadog, ELK, Grafana, Site24x7).
  • Documenting SOP and RCAs.
  • Understanding of security best practices and compliance standards. Security Certification is a plus.