Job responsibilities:
- Provide technical leadership and guidance to the cloud engineering team
- Lead the design and development of the cloud infrastructure offerings and platform tools, ensuring that they are secure, scalable, and reliable
- Stay up-to-date with the latest advancements in cloud technologies and bring in recommendations for adoption and implementation of new tools/technologies
- Develop secure and high-quality production code, perform code reviews and debug issues
- Partner with development teams who create our customer experience to identify and eliminate bottlenecks
- Analyze performance characteristics of systems across our platform and improve resiliency and security posture
- Gather insights and provide actionable intelligence to optimize infrastructure usage and costs
- Design and develop scalable AIOps solutions to support AI/ML and Data Platforms
- Implement data pipelines and workflows to collect, process, and analyze large volumes of platform data in real-time
- Ensure the reliability, availability, and performance of the AIOps platform through effective monitoring and maintenance
- Develop and deploy agentic systems and agents to automate routine tasks and processes, enhancing operational efficiency
Required qualifications, capabilities, and skills
- Bachelor’s degree in Computer Science, Data Engineering, or a related field.
- Proven experience in platform engineering, with a focus on AI/ML technologies and IT operations
- Formal training or certification on software engineering concepts and 7+ years applied experience
- Hands-on experience with one or more cloud computing platform providers AWS/Azure/GCP
- Advanced knowledge of Containerization and Container Runtime/Orchestration platforms (Docker/Kubernetes/ECS etc.)
- Hands-on experience with Cloud Infrastructure Provisioning Tools like Terraform, Pulumi, Crossplane etc.
- Proficiency with programming languages like Golang/Python and understand software development best practices
- Hands-on experience with CI/CD/SCM tools like Jenkins, Spinnaker, Bitbucket / Github etc. and with logging and monitoring tools Splunk, Grafana, Datadog, Prometheus etc.
- Deep understanding of cloud infrastructure design, architecture and cloud migration strategies
- Strong knowledge of cloud security best practices, shift left methodologies and DevSecOps processes
- Experience in designing and developing scalable AI platforms
Preferred qualifications, capabilities, and skills
- Master's degree in a related field.
- Experience implementing multi-cloud architectures
- Certifications in target areas (Cloud/Kubernetes/IaC etc)
- Experience leading end-end platform development efforts