YOUR TYPICAL DAY HERE WOULD BE:
- Design, build, observability dashboards using Dynatrace, Grafana
- Develop and implement strategies to improve system reliability, performance, and efficiency.
- Analyze performance data, identify bottlenecks, and implement solutions to optimize system performance.
- Develop and implement automation solutions to streamline operations, improve efficiency, and reduce manual intervention in infrastructure management and deployment processes.
- Collaborate with security service teams in implementing Security Reliability Engineering.
- Define and implement best practices for monitoring, alerting, and incident response to proactively identify and address potential issues.
- Drive continuous improvement initiatives to enhance the reliability, scalability, and performance of our systems through automation and infrastructure optimization.
- Collaborate with development teams to identify and address performance bottlenecks.
WHAT YOUR SKILLSET LOOKS LIKE:
- A relevant Bachelor's or Master’s Degree in computer science / engineering
- 7+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
- Hands on Experience with Dynatrace, Grafana, SPLUNK tools
- Experience setting up logging and monitoring services (Dynatrace, Grafana, GCP Ops Suites), developing dashboards
- Understanding of incident management processes and best practices.
- Expertise in automation and scripting languages (Python, Go, Bash).
- Knowledge of GCP and configuring infrastructure using infrastructure-as-a-code libraries like Terraform
- Demonstrated ability to drive continuous improvement and innovation in SRE and automation practices
- Excellent problem-solving and analytical skills.
WOULD BE GREAT IF YOU ALSO BRING:
- GCP / DevOps Certification