Job responsibilities
- Utilizes monitoring and analytics tools such as Grafana and Dynatrace to build a Capacity management risk profile view for applications and provide insights into infrastructure capacity and application performance, enabling the remediation of bottlenecks.
- Contributes to the tooling strategy for Infrastructure Capacity Management and Planning, focusing on forecasting and predicting infrastructure capacity limits in both private and public cloud environments.
- Collaborates with SRE LOB teams to support Infrastructure Capacity Management in CIB, focusing on compliance, quality, and standardization.
- Works with development and operations teams to ensure infrastructure scalability and reliability, leveraging Low Code Platforms to streamline processes.
- Assists in the implementation of capacity planning solutions to provide real-time insights into infrastructure utilization and performance.
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
- Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
- Formal training or certification onsoftware engineering*concepts and 3+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
- Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
- Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker along with troubleshooting common networking technologies and issues
- Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
- Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
- Ability to initiate and implement ideas to solve business problems
Preferred qualifications, capabilities, and skills
- Experience with cloud-native environments, primarily in AWS.
- Familiarity with industry-wide technology trends and best practices.
- Experience with Low Code Platforms.