• Enhance System Observability: You will be implementing and maintaining robust observability solutions which provides real-time insights into the performance and health of our systems to proactively identify and address potential issues before they impact the users.• Troubleshooting and Root Cause Analysis: Utilize your expertise to investigate and resolve incidents quickly during crisis situations, performing root cause analysis to prevent recurrence• Automation: Leverage your coding skills to create tools and automating runbooks to improve efficiency.• Documentation: Documenting and managing Runbooks and best practices to ensure knowledge sharing and team efficiency.• Communication: Strong interpersonal skills and ability to work effectively across multiple business and technical teams