

Share
Being the cybersecurity partner of choice, protecting our digital way of life.
Your Impact
Optimize infrastructure costs by monitoring resource utilization, rightsizing instances, and reducing waste to improve cost-efficiency.
Define and manage service-level objectives (SLOs) and related metrics to ensure service reliability and align with business goals.
Design and maintain secure cloud infrastructures that prioritize reliability, scalability, and efficiency.
Develop expertise in new technologies to enhance infrastructure and operations.
Collaborate with cross-functional teams to ensure applications are production-ready and highly available.
Automate deployments, monitoring, and alerting to streamline operations and improve reliability.
Diagnose and resolve critical issues, driving optimization and continuous improvement.
Participate in on-call rotations to support seamless service operations.
Contribute to design reviews to enhance system performance and scalability.
Your Experience
Creative thinker and collaborative team player with strong communication skills and a drive to make a meaningful impact.
Cloud and Infrastructure: Expertise in provisioning and managing cloud infrastructure on public or private cloud platforms (GCP, AWS, or Azure preferred), with strong proficiency in tools like Kubernetes, Terraform, and Ansible.
Database Operation: Proficiency in managing and optimizing SQL and NoSQL databases, including operational tasks such as provisioning, scaling, monitoring, backups, and troubleshooting. Experience with platforms like MongoDB, Redis, PostgreSQL, and MySQL is preferred.
System Reliability: Deep understanding of distributed systems, high-availability architecture, and strategies for scaling and optimizing system performance.
Service-Level Management: Proven experience defining and managing SLAs, SLOs, and SLIs to ensure service reliability and business alignment.
Cost Optimization: Expertise in monitoring and optimizing cloud infrastructure costs, including resource allocation and implementing efficient practices.
Load Balancing and Networking: Hands-on experience with Envoy or similar load balancing technologies, along with strong Linux system administration and advanced network troubleshooting skills.
Automation and Development: Advanced skills in programming and automation using Python, Golang, or shell scripting to streamline operations and enhance system reliability.
Production Deployment and Best Practices: Proven experience managing production deployments, ensuring system stability, and enforcing DevOps best practices.
Monitoring and CI/CD: Familiarity with CI/CD pipelines (GitLab CI preferred) and expertise in designing robust monitoring and alerting systems.
Collaboration and Communication: Exceptional ability to work with cross-functional teams, communicate effectively, and provide technical leadership.
Mindset and Motivation: Self-disciplined, self-managed, and self-motivated, with a strong sense of ownership, urgency, and drive. Passionate about infrastructure and monitoring as code.
Education and Experience: BS/MS in Computer Science, Computer Engineering, or a related field, with 3+ years of hands-on industry experience in Site Reliability Engineering or a similar role managing and improving complex systems at scale..
Compensation Disclosure
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $120,000 - $195,000/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found .
All your information will be kept confidential according to EEO guidelines.
These jobs might be a good fit