Responsibilities:
- Design, implement, and maintain the infrastructure for our machine learning models and applications, including data pipelines, workflows, and data storage solutions.
- Collaborate with data scientists and software engineers to develop and deploy machine learning models, ensuring seamless integration with our infrastructure and data pipelines.
- Ensure the scalability, reliability, and performance of our machine learning applications, using tools such as Kubernetes, Docker, and cloud providers like GCP.
- Develop and maintain automated testing and deployment scripts, using tools like Jenkins, GitLab CI/CD, or CircleCI.
- Monitor and troubleshoot issues with our machine learning infrastructure, using tools like Prometheus, Grafana, and ELK Stack.
- Participate in code reviews and contribute to the development of new features and tools for our machine learning infrastructure.
Requirements:
- Bachelor's degree in Computer Science, Computer Engineering, or a related field.
- 12+ years of experience in DevOps, cloud computing, and machine learning.
- Strong understanding of cloud providers like AWS, GCP, or Azure.
- Experience with containerization using Docker and orchestration using Kubernetes.
- Familiarity with machine learning frameworks like TensorFlow, PyTorch, or Scikit-Learn.
- Experience with data pipelines, data warehousing, and data lakes.
- Strong scripting skills in languages like Python, Bash, or PowerShell.
- Experience with CI/CD tools like Jenkins, GitLab CI/CD, or CircleCI.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
Compensation and Benefits
The annual base salary range for this position is $141,000 - $225,000.
This position is also eligible for a discretionary annual bonus in accordance with relevant plan documents, and equity in accordance with equity plan documents and equity award agreements.