Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Cisco Site Reliability Engineer Cloud Agent Operations 
United States, California, San Francisco 
477635879

12.06.2024

We don't expect everyone will have everything listed here. Don't hesitate to apply if the role looks interesting, you have a good foundation, and are enthusiastic about advancing your understanding of these and other technologies.

We are searching for engineers who...

  • Have an ability to design and implement scalable and well tested solutions with focus on kubernetes and terraform deployments.

  • Enjoy writing high quality code in Python, Go, or similar programming languages.

  • Understand how to use Infrastructure as Code (IaC) using technologies such as Terraform, Ansible, Puppet, and Kubernetes to build complicated systems while working to limit complexity.

  • Have experience with the various cloud provider managed services, especially, but not limited to, AWS.

  • Possess a solid understanding of Unix/Linux systems, the kernel, system libraries, file systems, and general Linux administrative knowledge.

  • Have knowledge of standard network protocols such as IPv4, IPv6, TCP, UDP, DNS, HTTP, TLS.

  • Are able to communicate and document complicated topics in concise and consumable language for readers of diverse levels of understanding.

  • Have a strong sense of ownership, drive and a dedicated attention to detail.

  • Have a dedication to excellence on both operations and development.

What You'll Do
  • Collaborate with software engineers across engineering to quickly and accurately identify any software bugs and provide pointers on performance or architecture improvements.

  • Design, build, deploy, and maintain a custom infrastructure deployment model built from the ground up that includes bare metal servers and virtualized infrastructure from all major and mid-level cloud providers.

  • Drive and build automation wherever possible, enabling the fleet to scale itself as needed.

  • Participate in on-call rotation and collaborate with the team to improve our 24x7 incident response.

  • Be passionate about giving our customers the best possible experience.

  • Analyze, debug, and solve issues across our infrastructure and platform services.