Leads initiatives to improve the reliability and stability of the applications and platforms using data-driven analytics to improve service levels
Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt.
Write and maintain code in Java or similar language, Python, Angular or similar frameworks to build and enhance observability tools and platforms. Automate repetitive tasks to improve system reliability and developer productivity
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks.
Provide overall direction, oversight, and coaching for a team of entry-level to mid-level software engineers that work on basic to moderately complex tasks
Be accountable for decisions that influence teams’ resources, budget, tactical operations, and the execution and implementation of processes and procedures
Ensures successful collaboration across teams and stakeholders
Identifies and mitigates issues to execute a book of work while escalating issues as necessary
Provides input to leadership regarding budget, approach, and technical considerations to improve operational efficiencies and functionality for the team
Creates a culture of diversity, equity, inclusion, and respect for team members and prioritizes diverse representation
Required qualifications, capabilities, and skills
Formal training or certification on software applications concepts and 5+ years applied experience. In addition, 2+ years of experience leading technologists to manage and solve complex technical items within your domain of expertise.
Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines. Proficiency in programming languages such as Java, Angular, Python and terraform
Develop and maintain systems that allow for effective monitoring, logging, and tracing of software applications. This includes choosing appropriate tools and technologies, setting up dashboards, and ensuring the scalability and reliability of the observability infrastructure.
Advanced knowledge of observability tools and platforms (e.g., Dynatrace, Splunk, Grafana)
Extensive experience in a similar SRE or observability role.
Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure.
Excellent troubleshooting and problem solving skills. Ability to identify and solve problems related to complex data structures and algorithms.
Drive to self-educate and evaluate new technology. Ability to teach new programming languages to team members.
Strong leadership and management experience, with the ability to lead, guide, and mentor a team.
Experience with hiring, developing, and recognizing talent