What You’ll DoAs a Cloud Operations Engineer you will be working as part of the NOC supporting Cisco Secure Access, which is a collection of integrated, cloud-centric security capabilities that facilitates safe access to websites, software-as-a-service (SaaS) applications and private applications while enforcing security policies. You will be part of a group of Cloud Operations Engineers based in China with your other regional team members, as well as your manager located in Australia.
Your typical day will vary, but this is what you will be doing:
- Work in a 24/7 model.
- Monitor the health and performance of the systems/services, and quickly and efficiently respond to incidents/alerts as they arise.
- Investigate and resolve issues, or (depending upon complexity) raise the issue to higher-level Site Reliability Engineers and Developers.
- Support the configuration and maintenance of cloud services.
- Collaborate with Site Reliability Engineers and network teams to pinpoint and troubleshoot incidents.
- Mentoring and assisting other NOC team members.
- Writing documentation to assist operational procedures and team technical knowledge.
- Proactively identify process gaps and work on continuous improvement action plans.
- Engage with our vendors and providers to escalate critical infrastructure issues.
- Develop and enhance monitoring, automation and operational tools.
- Participate in on-call and be flexible to work on weekends in a rotational model.
About youYou have a proven track record of working in a high-transactional, 24x7 production environment and experience running services in a public cloud. You have a solid understanding of Cloud Deployment concepts and popular AWS services like EC2, EKS and ELB. You know your way around a Linux shell and understand Linux networking stack, file management, package management, virtual environments and checking log files. You understand networking concepts such as IPv4/v6 addressing, subnetting, TCP/UDP and basics of AWS networking services such as VPC and Transit Gateway.
This role could be a good fit for you if these apply to you:
- Strong verbal and written communication skills in English.
- Problem solving mindset adapted to understand complex environments with different stakeholders involved.
- Strong collaboration and teamwork skills.
- Self-taught individual with proactive mindset and willingness to learn with minimal guidance.
Ability to multitask, prioritize tasks, and handle pressure well in a fast-paced environment.
- Previously worked in a fast paced, 24x7 operational/support environment.
- Experience using Kubernetes and Docker.
- Solid networking knowledge: switching and routing concepts, IPv4/v6 addressing, troubleshooting using traceroute and packet capture tools.
- Hands-on experience with deployment and troubleshooting of containers, distributed applications and microservices.
- Solid understanding of AWS services, such as: EC2, EKS, ELB, VPC concepts, Route53, Direct Connects, Transit GW.
- Knowledge of VPN and SASE, security postures and policies.
- Experience with automation tools like Jenkins and CI/CD is a plus.
- Solid knowledge of Linux environment (networking, system logging, file management, package management, virtual environments, etc.).
- Experience using monitoring tools and metrics to develop and understand SLI/SLOs, identify problems and respond to production alerts.
Experience using Grafana/Prometheus and AWS CloudWatch is desirable, Splunk is a plus. - Experience using infra-as-code languages such as Ansible, Puppet or Chef.
- Experience writing scripts in Python and using Git repositories or other SCM tools.
- Experience with automation and integrating with different providers via REST API.
- Experience with ticketing and documentation tools such as Jira, Confluence or similar.