Collaborate with data scientists and research/machine learning engineers to deliver products to production.
Build and maintain scalable infrastructure as code in the cloud (private & public).
Manage infrastructure for model training/serving and governance.
Manage data infrastructure supporting the inference pipelines.
Contribute significantly to architecture and software management discussions & tasks
Rapid prototyping & shorten development cycles for our software and AI/ML products:
Build infrastructure for our AI/ML data pipelines & workstreams from data analysis, experimentation, model training, model evaluation, deployment, operationalization, and tuning to visualization.
Improve and maintain our automated CI/CD pipeline while collaborating with our stakeholders, various testing partners and model contributors.
Increase our deployment velocity, including the process for deploying models and data pipelines into production.
Requirements
Minimum Bachelor of Science degree in Computer Science, Software Engineering, Electrical Engineering, Computer Engineering or related field.
Experience in containerization - Docker/Kubernetes.
3+ years of experience in AWS cloud and services (S3, Lambda, Aurora, ECS, EKS, SageMaker, Bedrock, Athena, Secrets Manager, Certificate Manager etc.)
Proven DevOps/MLOps experience provisioning and maintaining infrastructure leveraging some of the following: Terraform, Ansible, AWS CDK, CloudFormation.
Experience with CI/CD pipelines ex. Jenkins/Spinnaker.
Experience with monitoring tools such as Prometheus, Grafana, Splunk and Datadog.
Proven programming/scripting skills with some of the modern programming languages like Python.
Solid software design, problem solving and debugging skills.
Strong interpersonal skills; able to work independently as well as in a team.
Desirable
You have a strong commitment to development best practices and code reviews.
You believe in continuous learning, sharing best practices, encouraging and elevating less experienced colleagues as they learn.
Experience with data labelling, validation, provenance and versioning.