The point where experts and best companies meet
Share
Job Description:
Job Description:
This job is responsible for providing front-line support to end users, responding to issues related to incidents and problem management governance for multiple applications, and leading triage activities on all business impacting incidents. Key responsibilities include ensuring compliance with incident management and problem management policies and procedures, serving as a focal point for the customer, client, and associate experience, restoring complex production incidents under tight Service Level Agreements, and pursuing root cause and problem resolution follow ups.
Responsibilities:
Leads production support triage efforts, manages bridge line troubleshooting, engages in technical research, and escalates issues to leadership as needed
Ensures all impacts are accurately recorded and documented in the system of record, oversees that documents and wikis are updated and available for use during triage, and supports the documentation of application flows, upstream/downstream impacts during outages, the customer experience, and contacts for support needs
Identifies and/or validates business impacts through interpretation of monitors, dashboards, and logs to communicate with leadership and vendors
Manages activities to identify incident root cause, resolution, preventative actions, and change requests, and reports on incident data quality
Promotes and enforces production governance during triage/testing and identifies production failure scenarios, vulnerabilities, and opportunities for improvement
Serves as a subject matter expert for applications within a portfolio, leveraging extensive knowledge of application functionalities and application flows
Assesses and prioritizes research requests, ad hoc reports, and offline incidents at the direction of senior team members and delegates work as needed to team members and peers
Required Qualification :
Experience architecting a large-scale production database platform.
5 + year of experience on data base management.
Strong Knowledge of Postgres production and contingency peplication feature and configuration.
Strong Knowledge of Postgres HA Clustered environment.
Strong hands-on experience on failover/migration and data restores in a HA environment.
Support Database patching and ability to provide continuous support for the Application.
Proficient in handling crontabs and data backups using pgBackRest.
Proficient in handling Kubernetes/Open Shift cluster for PostgreSQL.
Creating and maintaining documentation, troubleshooting playbooks, testing failover and recovery plans.
Perform regular database maintenance tasks.
Ability to write ansible playbooks.
PostgreSQL DBA experience in a 24x7 production environment.
Desired Qualifications :
Proficient in one of the following scripting languages: Python, Bash
Skills:
Production Support
Risk Management
Automation
Collaboration
Innovative Thinking
Solution Design
Solution Delivery Process
Stakeholder Management
These jobs might be a good fit