Finding the best job has never been easier
Share
What you will do:
Manage, deploy, and operate cloud solutions at scale using the principles of Site Reliability Engineering
Participate in the design and development of new features to enable Data 'as-a-service'
Design and write automation software to provision, upgrade, monitor, and heal Data 'as-a-service'
Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
Define Service level Objectives, implement them along with runbooks
Participate in product release cycles, deploying code to integration, staging and production environments, integrating with CI/CD tooling, monitoring and change management
Interact with automated monitoring and healing infrastructure to ensure healthy environments
Help and develop peers through knowledge sharing, mentoring and collaboration
Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes and remediating problems in our environment
Participate in a follow-the-sun on-call rotation
Contribute software tests and participate in peer review to increase the quality of our codebase
What you will bring:
5+ years software engineering experience using object-oriented languages; Golang/Python are preferred
5+ years of experience develop, manage Infrastructure as code automation platforms - Terraform is preferred
3+ years of experience in troubleshooting as-a-service offerings (SaaS, PaaS, etc.)
2+ years of experience with any of the public cloud services - AWS is preferred
1+ year of experience with Kubernetes
Prior experience with Snowflake, Fivetran is a plus
Superior communications skills and experience working directly with and presenting to stakeholders
Ability to quickly learn new technologies and follow industry trends
Excellent communication, presentation, and writing skills
These jobs might be a good fit