4 years of experience in incident management for large-scale, customer-facing retail applications, with a focus on impact-driven prioritization, root cause analysis, and timely resolution
Proven 4-year track record of strong troubleshooting, problem-solving, and debugging skills in dynamic, production environments
4 years of hands-on experience in observability and monitoring using tools like Splunk and Prometheus, with expertise in creating complex queries and insightful dashboards
4 years of proficiency in at least one scripting language such as Python, enabling automation and efficient system management
BS in Computer Science or equivalent work experience
2 years of experience working with relational and NoSQL databases such as Oracle and Cassandra, including writing and optimizing complex queries for scalable and efficient data access
Willingness to participate in on-call rotations and provide weekend coverage as needed
Experience in communicating complex technical concepts to both technical and non-technical stakeholders
Strong problem solving skills, software development and debugging skills
Proven track record of taking ownership and successfully delivering results