Job Responsibilities:
- Deliver business-facing technology application support to business and operations teams across the Asia Pacific region.
- Conduct reviews and implement technology application and infrastructure releases, disaster recovery exercises, and patch management.
- Oversee production technology incidents until resolution, ensuring prompt engagement, escalation, and effective communication with business, technology, and vendor partners.
- Troubleshoot database issues and report findings.
- Enhance support documentation and procedures.
- Collaborate within a follow-the-sun support model with global counterparts.
- Serve as a Subject Matter Expert (SME) for key applications, upholding global best practices and hygiene standards.
- Conduct root cause analysis (RCA) post-incident, identifying, tracking, and implementing preventative measures.
- Contribute to the development of tools, frameworks, and techniques to boost productivity and quality in production support, applying SRE principles.
- Develop and support automation tools to enhance platform reliability and team productivity.
- Provide support during early APAC hours and rotational weekend work.
Required Qualifications, Capabilities, and Skills:
- Bachelor's degree in Engineering, Computer Science, or Information Technology.
- Over 10 years of experience in application support and production management.
- Strong database expertise with SQL queries for data investigations and other databases such as Oracle, MS-SQL, PostgreSQL, Cassandra.
- Experience in Unix and Windows Operating System; Kubernetes for Container Orchestration, and scripting languages: Perl, Python, Linux/UNIX shell.
- Proven track record in Production Support & SRE, with a clear understanding of SRE protocols and methodologies, and managing Incident & Problem Management calls for business-impacting outages, conducting post-incident analysis, and implementing preventative measures.
- Support management skills, including designing and using monitoring dashboards, generating service KPIs, reporting on service stability and performance, and log/message bus monitoring.
- Experience in technology disaster recovery planning and test execution.
- Ability to drive issue resolution across different support teams.
Preferred Qualifications, Capabilities, and Skills:
- ITIL v4 is beneficial.
- Ideally, AWS experience and certification, including core AWS services like EC2, S3, EKS, RDS, Cloudwatch.
- Prior experience in JAVA development is advantageous.
- Experience with cloud platforms (e.g., AWS, Google Cloud) is advantageous.
- Familiarity with Splunk for log analysis and monitoring.
- Telemetry & Application Performance monitoring tools: Splunk, AppDynamics, Dynatrace, Grafana, ITRS Geneos.
- Schedulers: Control-M / Autosys.