Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

NICE Senior Specialist Site Reliability Engineer 
India, Maharashtra, Pune 
949780088

30.06.2024

How will you make an impact?

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for multiple large distributed software applications
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Participate in system design consulting, platform management, and capacity planning
  • Create sustainable systems and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objectives

Have you got what it takes?

  • Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).
  • 8-15 years of working experience in a similar role, with a focus on systems engineering, automation, and reliability.
  • Proficiency in at least one programming language (e.g., Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell).
  • Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc)
  • Experience with infrastructure as code tools such as CloudFormation, Terraform.
  • Deep understanding of CI/CD concepts and experience with CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI.
  • Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Cloudwatch).
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
  • Experience of Incident management and blameless postmortems that includes driving the incident response efforts during outages and other critical incidents, resolution, and communication in a cross-functional team setup.
  • Handson experience of working with large Kubernetes Cluster. Certification will be an added plus.
  • Working experience of Grafana Observability Suite (Loki, Mimir, Tempo).
  • Administration and/or development experience of standard monitoring and automation tools such as Splunk, Datadog, Pagerduty Rundeck.
  • Familiarity with configuration management tools like Ansible, Puppet, or Chef.
  • Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent.

You will have an advantage if you also have:

  • Experience/knowledge of other cloud platform will be added advantage

Tech Manager, Engineering
Individual Contributor