Job Purpose
This is a 24x7 environment and the position requires shift rotation and/or weekend work.
Responsibilities
- Monitoring and Incident Management
- Monitor systems and applications within the production environment
- Diagnose and fix incidents raised through monitoring tools, conference bridges and chats
- Work with and escalate to internal and external teams to implement incident fixes, work-around and data recovery
- Open and update production incident tickets according to company standards
- Quickly assess issues and provide management with well-conceived short and long-term actions for restoration of services
- Communicate, escalate and perform root-cause analysis of production incidents
- Problem Management
- Investigate and update incident tickets with root cause and incident description, ensuring appropriate corrective action follow-up tickets are assigned
- Manage incident tickets to closure, ensuring incident details are complete and accurate, and all corrective actions have been completed
- Troubleshoot day-to-day customer issues and provide direct support to clients
- Prepare documentation of troubleshooting and escalation procedures
- System and Application Production Readiness
- Work with internal and external teams to expand and maintain operational runbooks and other documentation
- Check application and infrastructure availability and tasks at scheduled times
- Configure monitoring tools and alarms
- Deployment Management
- Production deployments
- Approve and execute production deployment tasks
- Participate in disaster recovery, business continuity and workplace recovery events
- Participate in continuous improvement programs, such as trend analysis of recurring issues
- Provide and report on performance metrics of the environment
- Follow the handover process documented to bring the next shift up to speed and highlight priority items or issues
- Communicate important information about system maintenance, changes and events to clients, and address concerns regarding any aspect of the services
- Understand the various trading and clearing platforms and apply technical knowledge to improve system performance and reliability
- Participate in the on-call support schedule to ensure adequate business support coverage during core hours. Provide additional coverage out of hours for deployment and continuity test activities.
Knowledge and Experience
- Bachelor’s degree (IT-based) or experience within IT systems support and/or operational support of applications databases within a Linux/Unix OS environment.
- Proficiency in Bash and working knowledge of a broad range of Linux core utilities and scripting
- Working knowledge of networking: specifically TCP and UDP
- Understanding of systems architecture and design
- Strong communication skills
- High level of general IT skills with email and MS Office Applications
- Able to think logically and critically
- Analytical problem-solving skills with an ability to identify root cause(s)
- Able to work as a team player across the organization
- Able to build and maintain effective relationships with individuals and the team as a whole
- Ability to be organized and decisive while under pressure
- Customer focused, and dedicated to the best possible user experience
- Excellent time management skills
- Able to manage priorities and multi-task
- Self-confident and assertive
- Demonstrate reliability, flexibility, and attention to detail
- Scheduling flexibility required
Preferred
- Experience in or understanding of financial markets, trading, and clearing systems
- Experience with FIX Protocol
- Basic knowledge of Java coding/debugging; ability to review logs/stack trace to debug issues
- Basic Unix Shell scripting skills
- Experience with enterprise monitoring solutions
- Understanding and working knowledge of TCP/IP, UDP and Multicast technologies
- Working knowledge of internetworking and various LAN/WAN technologies
- Experience with an any enterprise incident management software and knowledge of BigPanda and PagerDuty
- Experience with IBM MQ, Kafka
- Experience with a job scheduler like Tidal
- Working knowledge of router, switch, firewall, and proxy technologies: Apache, DNS, LDAP