As a crucial member of our team, you will significantly influence the reliability of our cloud services on hyperscaler platforms. Your expertise and skills will help reduce MTTR (Mean Time to Repair), ensuring the vitality and availability of our cloud services. You will play a key role in promoting operational excellence through leading improvements and innovation. Your work will enhance and maintain the company's technological platform, delivering high-level service and driving overall business performance.
What you'll do
- Liaison with Hyperscalers: Act as the primary interface between our internal teams and hyperscalers (AWS, Microsoft Azure). Maintain strong relationships and ensure effective communication during normal operations and incident response.
- Process Management : Oversee complex processes to ensure maximum efficiency, focusing on intervention, control, enhancement, and monitoring tasks specific to hyperscaler platforms.
- Knowledge Management : Establish strategies for creating, sharing, using, and managing knowledge using Large Language Models (AI) and organizational information, with a focus on hyperscaler-specific knowledge.
- Incident Management: Manage the team that Monitors cloud services and promptly escalate any events or failures to internal teams. Work closely with hyperscaler support teams to enhance the incident recognition in a timely manner.
- Automation and Scripting: Utilize Python to develop scripts and tools that can automate routine tasks, improve monitoring, and assist in restoring services quickly when issues occur.
- Continuous Improvement: Collaborate with internal teams to identify areas for improvement in cloud operations, and develop strategies to enhance system resilience and performance.
What you bring
- Education: Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Experience:
- 3+ years of experience working with hyperscaler platforms (AWS, Microsoft Azure).
- Proven experience in a liaison or escalation management role, preferably within a cloud environment.
- Strong knowledge of cloud infrastructure, including networking, storage, and security.
- Technical Skills:
- Solid understanding of cloud services, knowledge management, and operational processes, with a focus on hyperscaler operations.
- Proficiency in Python for scripting and automation.
- Familiarity with monitoring tools and incident management frameworks.
- Soft Skills:
- Excellent communication skills, both verbal and written.
- Demonstrated experience working with offshore 3rd parties and internal teams.
- Strong problem-solving skills with the ability to remain calm under pressure.
- Excellent leadership and decision-making skills.Ability to work collaboratively across teams and manage multiple priorities.
Preferred Qualifications:
- Certifications such as AWS Certified Solutions Architect, Azure Administrator, or similar.
- Experience with DevOps practices and tools (e.g., CI/CD, Docker, Kubernetes).
- Familiarity with ITIL or other IT service management frameworks.
Job Segment:Cloud, ERP, Operations Manager, Solution Architect, Developer, Technology, Operations