Job responsibilities
- Analyze and troubleshoot production application flows to ensure end-to-end application or infrastructure service delivery supporting the business operations of the firm
- Improve operational stability and availability through participation in problem management
- Monitor production environments for anomalies and address issues utilizing standard observability tools
- Assist in the escalation and communication of issues and solutions to the business and technology stakeholders
- Identify trends and assist in the management of incidents, problems, and changes in support of full stack technology systems, applications, or infrastructure
- Effectively manage incidents , taking ownership until closure , providing regular updates and minimizing downtime/ disruption.
- Support start of day , end of day checks and any intraday checks.
- Provides end-to-end application or infrastructure service delivery to enable successful business operations of the firm.
- Supports the day-to-day maintenance of the firm’s systems to ensure operational stability and availability
- Assist in the monitoring of environments for anomalies and address issues utilizing standard observability tools (i.e. Geneos, Dynatrace, Splunk)
- Identify issues for escalation and communication, and provide solutions to the business and technology stakeholders
Required qualifications, capabilities, and skills
- Experience or equivalent expertise troubleshooting, resolving, and maintaining information technology services
- Knowledge of applications or infrastructure in a large-scale technology environment on premises or public cloud
- Exposure to observability and monitoring tools and techniques
- Familiarity with processes in scope of the Information Technology Infrastructure Library (ITIL) framework
- Able to triage , debug and troubleshoot complicated distributed application requiring deep knowledge of running SQL queries , investigating log files using Unix command prompt.
- Experience working in a relational database environment (e.g. SYBASE, ORACLE, DB2). Oracle, SQL, Unix (commands, scripting ). ITIL Process knowledge.
- Able to take application support ownership of 7+ distributed applications.
- Working experience in one or more general purpose programming languages (Java, Python, Bash).
- Proficiency in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Geneos, Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Hands on experience working in K8s environment with expertise in running K8s command.
- Practical cloud native experience
Preferred qualifications, capabilities, and skills
- Knowledge of one or more general purpose programming languages or automation scripting