Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Ebay T25 Senior Platform Reliability Engineer 
India, Karnataka, Bengaluru 
590184851

15.07.2025

Responsibilities:
  • Reliability & Performance: Design, implement, and maintain systems and processes to ensure the high availability, performance, and scalability of our production platform.

  • Automation: Develop and implement automation for infrastructure provisioning, deployment, monitoring, and incident response, reducing manual toil and improving operational efficiency.

  • Observability: Implement and enhance comprehensive monitoring, logging, and alerting solutions to provide deep insights into system health and performance.

  • Incident Management: Lead incident response efforts, conduct root cause analyses, and implement preventative measures to minimize future occurrences.

  • Capacity Planning: Collaborate with development teams to forecast resource needs and ensure the platform can handle anticipated growth and traffic spikes.

  • System Design & Architecture: Provide input on system architecture and design, advocating for reliability, scalability, and operational best practices from the outset.

  • Tooling & Infrastructure: Evaluate, select, and implement new tools and technologies to improve our platform's reliability, security, and operational capabilities.

  • Collaboration & Mentorship: Work closely with development, QA, and security teams to embed reliability practices throughout the software development lifecycle. Mentor junior engineers on SRE principles and best practices.

  • Documentation:

Qualifications:
  • Experience: 5+ years of experience in a DevOps, SRE, or similar role focused on platform reliability and operations.

  • Cloud Platforms: Strong hands-on experience with at least one major cloud provider (e.g., AWS, Azure, GCP).

  • Containerization & Orchestration: Expertise with Docker and Kubernetes for deploying and managing microservices.

  • Infrastructure as Code: Proficiency with IaC tools such as Terraform, CloudFormation, or Ansible.

  • Scripting & Programming: Strong scripting skills (e.g., Python, Bash) and experience with at least one compiled language (e.g., Go, Java, Node.js) for automation and tool development.

  • Monitoring & Alerting: Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic) and logging systems (e.g., ELK Stack, Splunk).

  • CI/CD: Solid understanding and experience with CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub Actions).

  • AI Code Generation: Familiarity with foundational AI concepts and practical experience applying AI-powered coding generation (e.g., OpenAI Codex, GitHub Copilot, Anthropic Claude, Cursor, Windsurf or understanding of transformer-based code generation) will be a significant asset.

  • Networking: Fundamental understanding of networking concepts (TCP/IP, DNS, Load Balancing, Firewalls).

  • Databases: Familiarity with database operations, performance tuning, and backup/recovery strategies (SQL and NoSQL).

  • Problem-Solving: Exceptional analytical and troubleshooting skills, with a methodical approach to identifying and resolving complex system issues.

  • Communication: Excellent verbal and written communication skills, capable of effectively communicating technical concepts to diverse audiences.