Senior Engineer, Payments Tech Operations
About Payments TechOps:
The Vision of Payments Techops is to empower Payments Engineering teams in delivering exceptional payment experiences to our Guests and Hosts through a foundation of Operational Excellence. TechOps shall achieve the same by
- Driving the ‘Customer Centric Payments Observability’ through the Flow level monitoring
- Driving ‘Customer Excellence’ by managing all customer service escalations (incl. potential) for the Payments Engineering teams
- Enhancing the ‘Engineering Productivity’ of the Payments Tech teams through proactive automations and tech-enabled processes
What you’ll do:
- Drive the Flow level observability strategy including instrumentation & operations to enhance the detection & mitigation capabilities
- Drive initiatives independently to fix root-causes identified from the repeat issues observed across monitoring platforms - challenge the status quo and follow through to completion
- Build proactive alerting and real-time monitoring tools to help identify issues early and in-collaboration with Product engineering teams, resolve the issues in a timely manner
- Develop observability standards/ framework for new product readiness to ensure service reliability in SOA and distributed systems
- Build Domain Expertise to achieve Scalability - by understanding the nuances of Payments - across processing, compliance and infra
- Drive large scale migration and adoption projects on Observability & Reliability by cross-collaborating with various Payments teams
- Collaborate with large set of stakeholders across engineering, infrastructure and operations teams to align and implement foundational & Operational programs
- Automate our alerts configuration across various observability tools (eg. Watchpoint, Kibana, Datadog etc.) that work across signals - metrics, logs and traces
- Bring ideas to life (i.e. production) to help make the lives of engineers better
- Advocate and implement reliable design patterns (circuit breakers, graceful degradation, end-point monitoring etc.)
- Partner with the broader Airbnb organization to learn from incidents through a blameless post mortem process
- Automate as much as humanly possible and always configure as code
About You:
- 8+ years of technical experience, with 5+ years of relevant industry experience in a fast paced tech environment
- Experience in building and implementing Observability/ SRE along with expertise in building availability/Reliability tools in a similar environment
- Experience in driving E2E SRE initiatives (L2/L3) and improving observability & reliability, preferably in the payments space
- You have strong working knowledge across observability tools (eg. DataDog, Open Telemetry etc.) & SRE Practices
- Experience in Application and Tool development (Java, .Net, Python) in Microservices environments. Previous experience in AI/ML will be a plus.
- Experience with initiatives across Auto scaling, Self-healing mechanism, Chaos Engineering, Performance optimization techniques will be a plus
- You have excellent communication skills and the ability to work well within a team and with teams across timezones
- You are a strong problem solver and have worked in a team that is on-call for production systems before
- Technical leadership: hands on experience leading project teams and setting technical direction and strategy
- You are passionate about efficiency, availability, technical quality and system quality
Offices: Bangalore, India