As a Senior MLOps Engineer, you will be pivotal in designing, implementing, and maintaining our AI cloud infrastructure. You will join a highly technological team responsible for developing our model-serving platform, optimizing inference, and ensuring reliability and cost efficiency. In this role, you will collaborate with key stakeholders across the company, including product engineering teams, internal clients (both researchers and engineers), and Security and DevOps, to automate processes, streamline operations, and drive impactful changes in our model-serving strategy.
Key Responsibilities:
- AI Infrastructure Design & Implementation: Lead the design and implementation of our cloud-native infrastructure, ensuring it is performant, scalable, and cost-effective.
- Automation & Streamlining: Help automate and streamline our AI operations and processes, enhancing efficiency and reducing manual intervention.
- Cross-Functional Collaboration: Work closely with product engineering teams, including internal clients (both researchers and engineers), security, DevOps, and more, to ensure adoption of our AI platform.
- Impactful Contribution: Play a major part in shaping the company’s AI serving strategy, focusing on innovation and long-term success.
Qualifications:
- 4+ years as a DevOps/MLOps Engineer, with extensive experience in cloud computing technologies, preferably AWS.
- Strong scripting and automation skills, ideally using Python and bash.
- Proficiency with Infrastructure as Code (IaC) tools such as Terraform and Terragrunt.
- Hands-on experience with GitOps and CI/CD methodologies and tools like ArgoCD, Argo Workflows, GitHub Actions, and Jenkins.
- Proven experience working with Kubernetes (K8s) in production environments.
- Experienced with common model-serving solutions: Ray/KServe/Triton/HF etc׳.
- Experienced with system observability & monitoring.
- A solid understanding of computer networking fundamentals, storage, REST/gRPC architecture.
- Excellent communication skills and the ability to execute projects from design to implementation.
We operate in a flexible hybrid work model.
for more details.