Job responsibilities
- Analyze and troubleshoot production application flows to ensure end-to-end application or infrastructure service delivery supporting the business operations of the firm
- Improve operational stability and availability through participation in problem management
- Monitor production environments for anomalies and address issues utilizing standard observability tools
- Assist in the escalation and communication of issues and solutions to the business and technology stakeholders
- Identify trends and assist in the management of incidents, problems, and changes in support of full stack technology systems, applications, or infrastructure
Required qualifications, capabilities, and skills
- 3+ years of experience or equivalent expertise troubleshooting, resolving, and maintaining information technology services
- 5+ years of hands on experience in production support or DevOps/SRE work in a large enterprise.
- Strong experience in a DevOps/SRE or support role, working with enterprise scale CI/CD technologies, SDLC tools and infrastructure/application monitoring technologies.
- Excellent problem-solving, collaboration skills and the ability to analyze complex issues and provide effective solutions.
- Experience with Unix/Linux platform, cloud platforms such as AWS, Azure, or GCP.
- Proficiency in one or more general purpose programming (Java, Python, .Net, C++, etc.)
- Hands-on Experience with CI/CD Pipelines in a globally distributed hybrid environment (Cloud & On-premise) using Git/BitBucket/Github, Artifactory, Jenkins.
- Experience in observability and monitoring/logging tools and techniques (Spluk, Grafana, Dynatrace, Elasticsearch etc.)
- Support knowledge of containerization & orchestration technologies: ie: Dockers and Kubernetes.
- Experience in Incident Management processes, communications and tools( Service Now, Netcool, Convey, etc.)
- Excellent written and verbal communications skills.
Preferred qualifications, capabilities, and skills
- Experience with one or more general purpose programming languages and/or automation scripting
- Working understanding of public cloud