What you will accomplish:
Proactive Monitoring : Continuously monitor the health of eBay's critical services to identify and address potential issues before they escalate.
Solution Development : Collaborate with Architecture, Engineering, and Operations teams to develop solutions that ensure high site availability, reliability and performance.
Collaborative Problem Solving : Work closely with partner teams to resolve recurring technical issues, onboard new alerts, and develop high-quality Standard Operating Procedures (SOPs).
Automation and Process Enhancement : Identify and implement opportunities to enhance automation and reduce manual workload, improving overall efficiency.
Enhance Monitoring Tools
Incident Management
What you will bring:
BSCS or related 4-year technical degree
3 years of experience in large-scale internet/server environments, including cloud computing and multi-tier architectures.
Experience with delivering solutions with software engineering skills including Java, Python, GO, etc
Strong incident management and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises.
Expert knowledge in large-scale web operations, including web-based Java/J2EE architectures, JVM configurations, and a deep understanding of UNIX, Linux, networking (TCP/IP), and databases (both relational and NoSQL).
Experience in android and iOS application debugging.
Experience with observability tools such as Grafana and Prometheus, and skills in documenting procedures for knowledge management.
Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments.
משרות נוספות שיכולות לעניין אותך