As a Site Reliability Engineer, you will be responsible for providing the platform for mission-critical ad-tech systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.
5+ years of experience supporting internet-facing production services and distributed systems.
Good programming skills in one of Java or Python, or Go.
Expertise in operating Linux based systems, with a proven understanding of its internals.
Experience in building and scaling distributed systems in a public, private, or hybrid cloud environment.
Understanding of core SRE concepts - Monitoring, Alerting, Incident management.
Passion for eliminating repetitive manual processes using automation and to improve them through repeated iteration
Experience building and running infrastructures on AWS, including using services like EKS, MSK, and ElasticCache.
Experience in Infrastructure as a code like terraform
Passion for customer privacy
Experience in leading the deep-dive and troubleshooting of production issues with an active diagnostic call.
Demonstrated problem-solving ability utilizing creative and innovative thinking while adhering to a strong sense of ownership, customer service, and integrity demonstrated through clear communication.
Experience building and operating infrastructure at scale.
Experience building solutions that reduces friction in software delivery.