Responsibilities
- Provide L1 and L2 Operational Support for the bank’s Fault Management toolset
- Implement Fault Management tool upgrades and changes in production environment.
- Assist with the development of change test plans, execute tests and document results.
- Coordinate Fault Management tool server/database/application patching
- Certify new product models/version and obtain their approval for use at the bank from all governance processes
- Develop and maintain event catalog and policies for fault monitoring
- Document Fault Management toolset support processes and procedures
- Assist with Audit deliverables for Fault Management toolset
- Audit managed device inventory of Fault Management toolset
- Coordinate device seeding and decommission remediation activities
- Perform Disaster Recovery exercises for Fault Management toolset
- Onboarding and testing new tools for Operational Support by Network Tools Operations
- Submit Change Requests for Fault Management toolset coordinating and managing the complete Change lifecycle
- Managing Unix OS and Hardware comprised of various Operating Systems. Should able to handle the Production Changes with necessary stakeholders involved.
- Close coordination with Tools Engineering team to build, test and maintain the Network Fault Monitoring systems and build self-healing automation with engineering teams, write test plans, execute tests and document results in Network tools and Engineering space.
- Assist with Audit deliverables for Network Toolset
- Work with Engineering team and Vendors to enhance their respective tools to meet new customer/security requirements and remediating open risk items, vulnerabilities, patches, etc. within the Fault Monitoring toolset
- Refresh/Upgrade existing toolset as it becomes EOL/EOS or Not-Permitted within the bank.
- Work with Engineer team to develop and test integrations for the various Network Tools
- Lead Disaster Recovery exercises for various Network Tools
- Provide Break-Fix Support for Network Toolset
Requirements:
Experience range: 8+ years
Foundational Skills
- Strong knowledge of system and networking concepts
- Overall 10+ years of experience in Fault/Event Management with minimum of 5+ years of experience on Fault Monitoring toolset - IBM Watson/IBM NetCool/NOI Suite preferably
- Minimum 3+ years of experience in IBM Netcool Operations Insight & AIOPS platform
- Demonstrated experience working with help desks and technical staff of multiple customers, for example: state agencies, commissions, boards, cities, and counties for the purpose of troubleshooting service issues.
- Demonstrated experience in incident response, problem analysis, investigation, and thorough root cause analysis.
- Strong skillset of process and procedure documentation.
- Strong knowledge of Windows & Linux operating systems, and networking concepts.
- Experience in event acknowledgment, creating tickets, and follow-up with support teams.
- Experience in Level 1/Level 2 troubleshooting for Network toolsets before creating tickets to support teams.
- Experience in developing and testing Netcool Impact custom policies and object server triggers to integrate with 3rd party systems
- Experience working in rotational shifts towards Operations and Support
- Communicating with verbal and written status reports as required
- Ability to review maintenance windows and validate suppression events.
- Review and document issues in event management life cycle processes
- Respond to alerts according to Standard Operating Procedures
- Generate service event consolidation reports
- Experience in developing and testing probe custom rules
- Configuration of Network discovery across multi-region environment, custom AOC files & custom network views
- Strong hands on with respect to Syslog, MTTrapd and MessageBus probes, JDBC, XML gateway
Desired Skills
- Bachelor’s degree in engineering, computer science, related field and or technical training.
- Prior Experience with SevOne Network Performance Management building maps/links, network discovery, poll policies, object grouping concepts. RedHat OpenShift container platform, Cognos Analytics highly preferred
- Experience on building Runbook Automations
- Functional knowledge of Network components such as switches, routers, firewalls, proxies, load balancers etc. Understanding of inter-dependencies of various network components
- Self-starter, self-directed and shows initiative.
- Focused on execution, delivery, and commitment to dates.
- Skills in Scripting desired; Python, Ansible, Shell, JavaScript
- Familiarity with the Software Development Life Cycle (SDLC) process and Agile Project delivery
- Sound engineering foundation with network and systems troubleshooting skills.
- Linux/Unix/Windows server administration
- Working knowledge of Active Directory
- Can tie strategy and actions to business impact and results.
- Demonstrates ownership: Is accountable and can hold others accountable (professionally)
- Strong written and verbal communications skills. Ability to communicate and influence upward as well as laterally.
- Organized and detail oriented; must have a strong technical acumen
- Ability to partner with others for the good of the initiative
- Experience in various Fault Monitoring tooling vendors including but not limited to IBM, SevONE, MicroFocus, SPLUNK, as well as equipment manufacturers such as Cisco, Aruba, Cloudgenix, Fortinet.
- Financial services (Insurance, Banking, Investment banking) is a plus.
Work Timings– In Shifts (Shift 1: 7:30 AM to 4:30 PM IST; Shift 2: 12:30 PM to 9:30 PM IST), Any 5 days a week including weekends
– Mumbai, Hyderabad or Chennai