Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Citi Group Observability Dev Ops Lead Engineer - AVP C12 Chennai 
India, Tamil Nadu, Chennai 
552625497

03.09.2024

Strong focus on communication and technical skills, system stability, quality and functionality against user expectations, problem management and resolution, including issue documentation, root cause analysis and trend analysis. Ensure all processes and procedures are being always followed to comply with audit and regulatory requirements.

Responsibilities:

  • Drive engineering and certification of event management and infrastructure monitoring platform products.
  • Development of custom tooling extension and integrations of monitoring services with external systems, such as CMDB’ s, ticketing and notification systems and larger data lake technologies and deliver with automated SRE tooling functions.
  • Work closely with engineering, and operations, and applications teams across Citi to understand and collect monitoring and data analytics requirements.
  • Provide operational support to existing Event and notification systems and build SRE automations to manage production support functions.
  • Utilizes good understanding of apps support procedures and concepts and basic knowledge of other technical areas to field issues and queries from stakeholders, provide short-term resolutions and work with relevant technology partners for long term remediation.
  • Develop a comprehensive understanding of how areas of apps support collectively integrate to contribute to achieving business goals.
  • Participates in disaster recovery testing.
  • Participate in application releases, from development, testing and deployment into production.
  • Perform post release checkouts after application releases and infrastructure updates.
  • Develop and maintain technical support documentation.
  • Analyses applications to identify risks, vulnerabilities, and security issues.
  • Makes evaluative judgments based on analysis of information, resolves problems by identifying and selecting solutions.
  • Cooperation with Development colleagues to prioritize bug fixes and support tooling requirements.
  • Active involvement in and ownership of Support Project items, covering Stability, Efficiency, and Effectiveness initiatives.
  • Co-ordinate with vendor management for any issues / new developments and have frequent meetings to address all the gaps.
  • Proactively check and remediate all CAMP / FEMA / CISAR / Black duck /VTM alerts to be complaint.
  • Willing to get cross trained with other applications within event management like SMRP / NOI and provide end to end support as and when required.
  • Understanding of Ansible playbook and Starfleet functionality.
  • Ensure true end-to-end ownership of production environment, exceeding stability targets through collective ownership of initiatives across all plans, build, and operate functions.
  • Collaborate with engineering teams in the improvement of CI/CD practices.
  • Support and maintenance of infrastructure to include cloud deployments.
  • Collaborate with development, QA, and engineering teams globally on various business projects.
  • Provide timely and regular communication, and overall project reporting within team, business partners, business leadership and engineering leadership.
  • Ability to communicate effectively across various levels of management, technology, architecture, and compliance.
  • Proficiency to study historical performance trends by using dashboards, data, charts, etc.
  • Handle all JIRAs assigned and make sure it is driven and completed on time.
  • Perform Incident, Change and Problem management including prioritization, root cause analysis and escalation/coordinate to appropriate groups.
  • Provide on call support during weekend or when required for the applications on a rotational basis.

Qualifications:

  • 9+ years of experience into IT infrasture domain with minimum 5 years’ Engineering of 3rd party software and services in the event management notification tooling space
  • Proficient in Agile work methods and JIRA based workflow management.
  • Knowledge on Ansible playbook automation.
  • Knowledge on yaml will be an added advantage
  • Proficient in shell scripting, Perl or python
  • Strong skillset required in Linux (RHEL) and Windows; OS concepts and services; regular expressions; end-to-end testing; performance/scalability testing.
  • Must have exposure to Incident Management tools like ServiceNow or any similar market application
  • Experience with software delivery and documentation tools like Bit bucket, Artifactory, Jenkins and Confluence.
  • Knowledge on databases like MSSQL, Oracle, MongoDB will be plus.
  • Knowledge of Event Analytics, Data Science, Machine learning and Artificial intelligence will also be a plus.
  • Open minded and willing to learn new tools/methodologies/concepts.
  • Experience with cloud platforms like AWS, Azure or Google Cloud is a plus

Education:

  • Bachelor’s degree/University degree or equivalent experience

Time Type:

Full time

View the " " poster. View the .

View the .

View the