Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Cisco Site Reliability Engineer 
United States, Georgia, Atlanta 
438165391

10.06.2024

About The Role

What You Will Do

● Design and implement visibility into our platform as we grow to multi-region scale.

● Design, deploy, and maintain cloud native monitoring services in AWS and GCP that are elastic and resilient to failure.

● Provide standards and best practices for instrumentation of container based services and cloud managed services.

● Maintain our alerting pipeline so that we are notified of the right things, at the right time, in the right places.

● Drive automation wherever possible, enabling our monitoring platforms to scale effortlessly. Think self service.

● Participate in and contribute to improve our 24x7 incident response and on-call rotation.

● Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.

● Strong knowledge of modern logging tool sets, including Logstash or Fluentd.

● Understanding of Prometheus and it’s ecosystem, including Alertmanager.

● Good knowledge of Application Performance Monitoring tools and crash reporting tools, such as Sentry.

● Good knowledge of cloud provider managed services, and how they can be leveraged in our context.

● Ability to write high quality code in Python, Go, or equivalent languages.