Design and build the infrastructure for a global cloud service that comprises hundreds of thousands of MongoDB clusters, processes a billion metrics per day, and replicates tens of billions of database writes to our backup service
Design, implement, and troubleshoot the automation and monitoring of services that seamlessly spans the globe - including several cloud providers
Become an expert in infrastructure performance, helping us optimize from the application level all the way through the firmware
Build for resilience. Our goal is that nobody’s pager goes off, ever. Are we there yet? No. Are we really close? Very. While we work on that - participate in a weekly on-call rotation
Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability
Requirements
3+ years of experience running a mission critical service at scale in a Linux environment
Firm grasp of at least one modern programming language, beyond basic scripting
Familiarity with web and network protocols and standards (HTTP, TLS, DNS, etc)
Bachelor’s degree in Computer Science or equivalent experience
Experience writing automation tools & eagerness to "automate all the things"
Nice to have
Experience building large applications from scratch, complete with CI/CD infrastructure
Experience in networking, security, hardware or OS performance tuning
Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
Experience managing kubernetes clusters or some other container orchestration infrastructure
Experience with observability of large scale distributed systems