המקום בו המומחים והחברות הטובות ביותר נפגשים
Site Reliability Engineers are responsible and take ownership for reliability, scalability, automation, and other aspects related to uptime and availability of our database services. You will need to have strong skills in following areas:
Design, write and build tools to improve the reliability, availability and scalability of our Cassandra NoSQL Cloud Database Platform Offerings.
Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
Design and develop improvements, focused on resilience, to our production systems to achieve and surpass SLOs
Help improve our operational practices to minimize service disruptions
Work with engineers to identify root cause and fix issues
Influence, design and create new architectures, standards and methods for large-scale enterprise systems.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Bachelor’s degree in Computer science or related field
Technical Expertise: 6-9 yrs
Understand IT processes including architecture, design, implementation, and operations
Solid grasp of Cassandra Architecture and Administration which includes Designing, Provisioning, Upgrading, Operating, Backups, Security, Performance etc
Experience with CI/CD framework and tools like GIT, Jenkins
Experience with automating DB tasks using python, Database Lifecycle Management, Cassandra OpsCenter.
Solid grasp on building scalable databases using hybrid cloud infrastructure
Experience with configuration automation using Ansible
Experience and practice with public cloud like AWS, GCP, or Azure, Kubernetes preferred
משרות נוספות שיכולות לעניין אותך