Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
Develops secure high-quality production code, and reviews and debugs code written by others
Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems
Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies
Serve as the primary authority on Non-Functional Requirements, ensuring that all systems meet or exceed performance, reliability, and scalability standards.
Develop and implement observability standards to provide comprehensive insights into system performance and user experience.
Design and maintain robust telemetry systems to collect and analyze data from both public and private cloud environments.
Establish and manage monitoring and alerting frameworks to proactively identify and resolve issues before they impact users.
Collaborate with cross-functional teams to integrate SRE best practices into the software development lifecycle.
Drive continuous improvement initiatives to enhance system reliability and operational efficiency. Provide technical leadership and mentorship to engineering teams on SRE principles and practices.
Stay current with industry trends and emerging technologies to ensure our systems remain at the forefront of innovation.
Required qualifications, capabilities, and skills
Formal training or certification on software engineering concepts and 5+ years applied experience
Hands-on practical experience delivering system design, application development, testing, and operational stability
Advanced in one or more programming language(s)
Proficiency in automation and continuous delivery methods
Proficient in all aspects of the Software Development Life Cycle
Proven experience in a senior SRE or similar role, with a strong track record of managing complex cloud-based systems.
Deep understanding of public and private cloud architectures, including AWS and on-premises solutions.
Expertise in observability tools and practices, such as Prometheus, Grafana, ELK Stack, or similar technologies.
Strong knowledge of telemetry, monitoring, and alerting frameworks.
Excellent problem-solving skills and the ability to work independently as an individual contributor.
Strong communication and collaboration skills, with the ability to influence and drive change across teams. Bachelor's or Master's degree in Computer Science, Engineering, or a related field.