Application window is expected to close on January 30, 2025. The job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.
The successful applicant will be performing work on US Government classified environments, and therefore, must be a U.S. Person (i.e., U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee).
Your Impact
As a Site Reliability Engineer, you'll be operating the next generation of Cloud Security suite products in our Federal running in an AWS GovCloud-native environment. This charter enables Umbrella to expand its access to the US government and other public sector market opportunities. Our current focus is on the platform's FedRAMP (Federal Risk and Authorization Management Program) authorization.
You will own the production environment from top-to-bottom, from deployments, through observability and incident response. You'll work with customer support, our partners as well as developers to keep our services running smoothly. This includes responding to alerts, running and improving our playbooks and streamlining our processes to improve efficiency.
Your typical day will vary, but this is what you will be doing:
- Developing and maintaining complex infrastructure
- Building platforms that provide key services to internal and external collaborators
- Designing solutions for scalability and reliability
- Ensuring compliance with security controls
- Build end-to-end documentation and instrumentation of our platform to ensure visibility, automation, self-healing, and resiliency throughout the stack.
- Triaging, solving, and addressing production problems in every layer of the stack
- Collaborate with other engineers on the team to cultivate sound engineering principles and represent our engineering values
Minimum Qualifications:
- 4+ years AWS infrastructure experience
- 3+ years on-call experience including incident response, provisioning, and deployments
- 3+ years of operational experience including continuous delivery and deployment (CI/CD) and cloud automation
- 2+ years of Linux systems experience (Ubuntu, Debian, etc.)
Preferred Qualifications:
- US Security clearance (especially T4 clearance) is a huge plus!
- Debugging application and infrastructure integrations in complex cloud environment
- Experience with containerized environments to include Kubernetes, Docker, etc.
- Experience working with end-users and technical support on customer concerns.
- Collaborating with development teams to resolve issues efficiently.
- Expertise in the administration of enterprise-grade infrastructures like AWS, Azure, etc.
- Background in infrastructure-as-code tools like Terraform, Ansible, CloudFormation, etc.
- Strong security and networking skills working with tools like: Wireshark, tcpdump, Nmap
- Strong scripting skills in Golang, Python, and bash
- Ability to participate in a 24/7 on-call rotation
- Experience in monitoring and analyzing infrastructure using tools such as Datadog, CloudWatch, Grafana, etc.
- Experience with CI/CD tools like Jenkins, GitLab, ArgoCD, etc.
- Have the urge to detail all the things so you don't need to learn the same thing twice.
- Identify and call out priority issues to the appropriate channels
- Have an enthusiastic, go-for-it demeanor. When you see something broken, you can't help but fix it.