Job responsibilities
- Applies technical knowledge and problem-solving methodologies to projects of moderate scope, with a focus on improving the data and systems running at scale, and ensures end to end monitoring of applications
- Resolves most nuances and determines appropriate escalation path
- Executes conventional approaches to build or break down technical problems
- Drives the daily activities supporting the standard capacity process applications
- Partners with application and infrastructure teams to identify potential capacity risks and govern remediation statuses
- Considers upstream/downstream data and systems or technical implications
- Execute making significant decisions for a project consisting of multiple technologies and applications
- Adds to team culture of diversity, equity, inclusion, and respect
Required qualifications, capabilities, and skills
- Formal training or certification on Infrastructure Engineering, Site Reliability Engineering and/or Software Engineering concepts and 3+ years of applied experience
- Experience working on support of products / interaction with internal customers
- Strong knowledge of one or more infrastructure disciplines such as hardware, networking terminology, databases, storage engineering, deployment practices, integration, automation, scaling, resilience, and performance assessments
- Strong knowledge of one or more scripting languages (e.g., shell Scripting, Python, etc.)
- Strong communications skills, verbal, written, ability to drive meetings and knowledge sharing sessions to teams
- Experience with multiple cloud technologies with the ability to operate in and migrate across public and private clouds
- Drives to develop infrastructure engineering knowledge of additional domains, data fluency, and automation knowledge
- Knowledge and hands on experience with tools like (Jira, Confluence, Service Now, Net cool)
Preferred qualifications, capabilities, and skills
- Hands on experience with AWS / Azure / GCP or other cloud environments, including certifications
- Hands on experience with Terraform or other infrastructure as code technologies
- Hands on experience with CI/CD pipelines
- Hands on experience with GitHub and code reviews
- Hands on experience with DevOps using Python, scripting for automation
- Knowledge of Observability tools like (Grafana, Dynatrace, Apica, Splunk, etc)
- Knowledge of SRE best practices