Share
Collaborate with internal customers and partners to deliver key business outcomes.
Ensure that cloud products are reliable, scalable, efficient, and compliant with eBay's security and operational standards.
Enhance observability practices to ensure comprehensive monitoring and alerting across cloud services.
Respond to cloud incidents, perform root cause analysis, and implement corrective actions to prevent future occurrences. Develop and maintain incident response plans.
Analyze system performance metrics and make recommendations for improvements. Implement changes to optimize resource utilization and improve application performance.
Drive improvements in CI/CD processes to increase deployment velocity and reliability.
Develop and maintain automation to streamline operations, reduce manual work, and enhance system reliability.
Minimum of 3+ years of programming experience with Go or Python.
5+ years of experience in implementing large-scale, distributed, high-availability, fault-tolerant systems and infrastructure in a production environment.
Proficiency in delivering products within a multi-functional team environment.
Demonstrated expertise in observability tools and practices, ensuring system reliability and performance.
Extensive experience with Kubernetes as an SRE, or related cloud infrastructure and cloud-native technologies. Experience in developing with Kubernetes and/or building Kubernetes controllers is highly desirable.
Deep understanding of API design and RESTful principles, with experience in building web services at scale.
Certifications in Kubernetes, lifecycle management or related fields.
Understanding application lifecycle management, CI/CD is a plus.
Experience in a high-traffic, large-scale environment.
Familiarity with additional programming languages or frameworks.
Proficiency in Agile development methodologies.
Experience in participating in open-source standards and contributing to open-source projects is a plus.
These jobs might be a good fit