Design, develop and manage SLO based observability solutions, including metric identification\validation, centralizing in GEM\Prometheus & visualizing in Grafana dashboards.
Write and manage complex queries and alert definitions.
Bridge the gap between Operations Support teams and SRE operations.
Configure and manage monitoring, alerts, and observability using a range of tools including GEM, Splunk, Netcool, ELK, and AIM.
Maintain deep technical knowledge and operational experience with tools like AppDynamics, GEM, AIM\ELK, Splunk, Prometheus, and Grafana.
Understand and write code (Java, Python, Ansible etc.), programs, config files, and complex queries.
Implement and manage Infrastructure as Code (IAC) using Ansible and Python.
Establish design patterns for monitoring and benchmarking SLOs
Provide thought leadership and strategy in implementing and maintaining observability solutions.
Create and maintain operational process documentation for observability solutions.
Optimize the Observability Suite for monitoring applications and infrastructure.
Qualifications:
6-10+ years’ experience in an Application Developement or Support role. Relevant experience in a critical software engineering role with high business impact.
Proven experience in SLO\SLI creation, observability dashboard design/development, and an understanding of business workflows across Public and On-Prem containerized/microservice environments.
Excellent working knowledge of key computer science & engineering concepts (networking, operating systems, virtualization, containerization, etc.).
Experience with IAC automation and CI\CD pipeline creation and management is desirable.
Experience of senior stakeholder management.
Highly assertive communications skills, commanding personality. Ability to engage a large audience and lead the discussion with clear, articulate, and highly assertive communication. Must show confidence in all communications.
Consistently demonstrates clear and concise written and verbal communication skills.
Ability to plan and organize workload
Education:
Bachelor’s degree/University degree or equivalent experience