As a Technology Support III team member in Employee Platforms, you will ensure the operational stability, availability, and performance of our production application flows. Encourage a culture of continuous improvement as you troubleshoot, maintain, identify, escalate, and resolve production service interruptions for all internally and externally developed systems, leading to a seamless user experience.
Job responsibilities
- Provides end-to-end application or infrastructure service delivery to enable successful business operations of the firm
- Supports the day-to-day maintenance of the firm’s systems to ensure operational stability and availability
- Assist in the monitoring of production environments for anomalies and address issues utilizing standard observability tools
- Identify issues for escalation and communication, and provide solutions to the business and technology stakeholders
- Executing standard software solutions, design, development, and technical troubleshooting
- Design, develop, code, and troubleshoot with consideration of upstream and downstream systems and technical implications
- Release management and SDLC including experienced in Jules and Jenkins pipeline deployments.
- Provide primary infrastructure operational, optimization and engineering support for the on-prem, private and public cloud platform.
- Follow processes to report defects, track and analyze the test results methodically and systematically so audit and lineage are always maintained.
- Support Incident management and follow through to RCA finding and resolution. Requires diverse coordination skills.
Analyze complex situations and trends to anticipate and solve incident, problem, and change management in support of full stack technology systems, applications, or infrastructure
Required qualifications, capabilities, and skills
- 3+ years of experience or equivalent expertise troubleshooting, resolving, and maintaining information technology services
- Demonstrated knowledge of applications or infrastructure in a large-scale technology environment both on premises and public cloud
- Experience in observability and monitoring tools and techniques
- Familiarity with troubleshooting common networking technologies and issues
- Proficient in site reliability culture and principles, with familiarity in implementing site reliability within an application or platform
- Experience in observability, including white and black box monitoring, service level objective alerting, and telemetry collection using tools like Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
- Familiarity with container and container orchestration technologies such as ECS, Kubernetes, and Docker
Preferred qualifications, capabilities, and skills
- Experience with one or more general purpose programming languages and/or automation scripting- Python, Java/Spring Boot, or .Net
- Experience in cloud computing, preferably with AWS