Professional experience in Site Reliability Engineering, DevOps, or a related field.
Experience working with cloud compute environments like OpenStack, AWS, GCP or Azure
Experience with infrastructure as code (IaC), configuration management, CI/CD, and automation, e.g., Terraform, Pulumi, CloudFormation, CDK, Ansible, Chef, Puppet, Jenkins
Strong proficiency in software development using Python, Rust, and/or Go programming languages
Preferred Qualifications
Bachelor’s degree in Computer Science, or a related field, or equivalent practical experience
Extensive experience administering, performance tuning and troubleshooting Linux systems
Excellent troubleshooting, problem solving, and debugging skills
Ability to cultivate an environment that emphasizes collaboration, accountability, and excellence
Excellent written and verbal communication skills
Ability to work under pressure and manage difficult situations in a dynamic work environment
Thrives in fast-paced environment and adopts a learning mindset; loves learning new technologies
Proficiency in implementing and correlating telemetry using monitoring and observability tools: Splunk, Grafana, Prometheus, ELK, SumoLogic or the like
Experience in shell scripting (e.g., bash/tcsh/zsh)
Experience with large environment system administration
Experience with measuring, analyzing, and optimizing performance
Experience operating with Scrum/Agile development methodologies
Strong understanding of concurrency, parallelism, and distributed system concepts
Passion for high-quality code, unit-tests, documentation, and production services
Previous experience working on a global team with 24/7 support model
Building and operating container orchestrating systems (Docker, Kubernetes, vagrant and micro-services)