Job responsibilities
- Implements distributes ML experimentation and training platform for firm-wide use in accordance with the requirements and design.
- Implements, and supports tools and workflows to facilitate machine learning experiments, automated training runs, and production deployments.
- Extends machine learning libraries and frameworks to support complex requirements.
- Delivers thoughtful data scientist experience with APIs and SDKs for the training platform.
- Collaborates with infrastructure engineering, product management, and security and compliance teams to deliver tailored, robust solutions.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Knowledge of software development processes in an ML environment.
- Understanding and hands-on experience with public cloud technologies, especially with AWS, in the context of ML engineering workflows, specifically featurization, experimentation, training, and evaluation.
- Programming skills in Python and experience with ML frameworks and libraries such as TensorFlow, PyTorch, Scikit-Learn, JAX, etc.
- Hands-on experience implementing DevOps practices using tools such as Docker, Jenkins, Spinnaker, and Terraform.
- Knowledge of Big Data and related technologies such as Hadoop, Spark, and Airflow.
Preferred qualifications, capabilities, and skills
- Knowledge of SageMaker, EMR, and AWS ML stack.
- Knowledge of Kubernetes ecosystem, including EKS, Helm, and Custom Operators.