Job responsibilities
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 10+ years applied experience. In addition, 5+ years of experience leading technologists to manage, anticipate and solve complex technical items within your domain of expertise
- Hands-on practical experience delivering system design, application development, testing, and operational stability
- Extensive experience in a similar SRE or observability role and implementation of SRE principles/practices to improve system reliability and availability
- Hands on experience with observability tooling like Dynatrace, OTel, Grafana, Prometheus, Cloudwatch , etc
- Expert in one or more programming language(s)- Java, Angular, Python and terraform
- Proficiency in cloud platforms (e.g., AWS, Azure, Google Cloud) and their services.
- Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
- Advanced knowledge of software application development and technical processes with considerable in-depth knowledge in one or more technical disciplines (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)
- Ability to present and effectively communicate with Senior Leaders and Executives
Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments
Preferred qualifications, capabilities, and skills
Knowledge of one or more infrastructure disciplines such as hardware, networking terminology, databases, deployment practices, integration, automation, scaling, resilience, and performance assessments.
Experience troubleshooting and problem solving skills related to complex data structures and algorithms