Your Role and ResponsibilitiesAs the Cloud SRE Manager your key responsibilities will include:- Managing a group of Site Reliability Engineers and the team’s day to day operation, all quarterly reviews, evaluations, and career development.
- Monitoring critical infrastructure and applying corrective actions.
- Managing critical customer issues, this requires on going communication with customers
- Preparing and delivering training sessions and other presentations
- Providing weekly quality reviews for team
- Providing a weekly status report showing metrics
- Identifying areas for improvement
- Liaising with the development, operations, network, and storage teams and driving customer ticket resolution and SLA with these organizations
As the Cloud SRE Manager, you should possess:
- Proven leadership skills
- Well organized with effective time management skills
- Have the ability to respond promptly to production issues and alerts
- Be comfortable operating in fast paced environment
- Be comfortable using and navigating within a Linux environment
Required Technical and Professional Expertise
- Five (5) years of experience in a technical support/operations manager role, at least (3) years of experience in a technical support or development environment (preferably cloud or managed servers)
- History of process improvement, problem solving skills, customer advocacy orientation, and leadership in a cross-functional team environment.
- Excellent leadership and management skills with emphasis on mentoring, motivating, and driving a large team to success.
- Experience implementing team processes and monitoring effectiveness.
- Ability to identify, analyze, prioritize, and resolve daily operational problems and issues.
- Strong written and verbal communication skills.
- Demonstrated leadership and team building skills.
- Energetic, motivated, and customer focused.
- Ability to quickly adapt to a rapidly changing technology environment.
- Ability to hire, train, and retain quality team members is critical.
- Experience using Splunk and or other dashboards
- Understanding of web technologies and technology stack
- Working knowledge with Network and Storage technologies
- Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
- ITIL Foundation V4 certification is a plus
Preferred Technical and Professional Expertise
- Understanding of business continuity, fault tolerant design, and fail-over architecture
- Automation of production monitoring
- Experience with configuration management systems
- Experience as a support engineer
- Experience with Kubernetes
- Experience with GitHub, Perl and Python
- Experience with service management tools such as Service Now, Jira, confluence etc.
- Experience writing scripts