Architect and implement scalable, user-friendly tools for AI workflows to track and visualize the lifecycle of machine learning experiments and models
Build robust tools and infrastructure to improve the machine learning team's velocity. This includes training and evaluation code in Python to back-end and front-end work in JavaScript
Collaborate closely with ML engineers to ensure tools are aligned with research needs
Design dashboards to provide real-time insights into performance and progress for our ML engineers and leadership
Coordinate required hardware resources with the team managing the cluster hardware to maintain high availability
What You’ll Bring
Strong knowledge of Python, React, and Linux
Experience working with backend infrastructure components (relational databases, in-memory caches, message brokers)
Experience building modern web applications using Flask/Django and React/Redux or similar component-based libraries
Experience deploying services on Kubernetes and setting up CI/CD flows
Solid understanding of security principles and best practices
UI and graphic design sensibilities
Experience working with HPC clusters
Knowledge of machine learning, computer vision, or neural networks