Lead and mentor a team of reliability engineers, fostering a strong culture of collaboration and continuous improvement.
Conduct regular one-on-one meetings with team members, providing guidance, feedback, and support for their career development.
Manage performance evaluations and provide constructive feedback and actively participate in all phases of growing the engineering organization through recruiting, team building, etc.
Reliability Engineering, Operations & Governance
Lead and coordinate engineering activities to successfully plan, communicate, and deliver on product features on time while designing for quality, observability, and scalability.
Ensure full software lifecycle instrumentation from requirement ideation to software development to deployment.
Drive the adoption of cloud-native technologies and standard processes, such as containerization, service mesh, microservices, etc.
Collaboration with internal partners and team members:
Reliability engineering and operations teams, product, and PMO on engineering resource allocation and project schedules in accordance with our strategic organizational priorities.
SRE team to champion automation to enhance efficiency and reliability.
Operations teams on maintaining a highly available telemetry and command/control infrastructure to ensure eBay’s products and services are available to our customers.
Fleet management team on capacity planning, resource allocation, and cost optimization for the telemetry control plane.
Information security teams to ensure integrity and compliance of the telemetry infrastructure by implementing appropriate security controls and monitoring.
What you will bring:
12-15 years of proven experience working in Infrastructure and software development and engineering organizations with 5 years’ experience in managing and leading both reliability engineering teams and software development teams.
Excellent at communicating critical updates to organizational leaders and executives including AI-driven reliability trends and insights.
Experience supporting medium or large tech organizations with many different internal customers and partners.
Experience working collaboratively in large distributed global teams.
Demonstrated ability to adopt and operationalize emerging AI tools, ensuring the team remains at the forefront of reliability engineering practices.
Knowledge of software development, networking, security, and storage technologies in a cloud environment and proven understanding of cloud-native architectures, microservices, and DevOps and SRE principles.
Passion for staying ahead of the curve in AI/ML innovation applied to observability, monitoring, and system reliability.