-Develop and maintain online inference services that provide real-time predictions with low latency and high reliability.
-Optimize and deploy large language models (LLMs) for efficient, scalable inference, ensuring high performance and low latency in production environments.
-Maintain and improve a model registry to facilitate the discovery, versioning, and governance of machine learning models.
-Participate in and improve ML Platform incident management and support workflows.
What we offer
Learning. You will be developing libraries, tools and services to ensure an efficient and reliable journey of productizing ML models. You will have the opportunity to work with stunning colleagues who value collaboration and have a wealth of experience you can tap into.
Who will be successful in this role?- You are highly customer-driven / developer-driven and empathic. You strive to always focus on delivering customer / user value with an excellent customer service mentality.
- You have a strong understanding of building scalable and efficient model serving solutions to support large-scale inference for generative models and large language models (LLMs). You create solutions that your stakeholders love and you drive development success from planning to implementation to delivery.
- You can successfully execute changes within a team's systems, including developing, testing, deploying, and revising solutions.
- You can communicate and collaborate effectively (e.g. project meetings, team meetings, code reviews) with immediate team peers and cross-functional project teams.
- You are eager to both go deep and wide on ML-facing projects. When a project needs deep technical expertise in a domain area you are able to get up to speed quickly. When projects require breadth of focus you are eager to do what’s needed to deliver value even if it means going outside of your comfort zone.
Skills:- Strong programming skills, particularly in languages such as Python and Java, and familiarity with ML libraries and frameworks like TensorFlow, PyTorch.
- Familiarity tools and techniques for deploying machine learning models into production environments, with a particular emphasis on GPU inference optimization (e.g., Triton Inference Server, TensorRT), as well as containerization (e.g., Docker) and orchestration (e.g., Kubernetes).
- Experience designing with data handling, preprocessing, and transformation techniques to prepare data for model inference.
- Demonstrated industry-leading experience in large-scale build, release, CI/CD and observability techniques, with particular emphasis on multi-language environments including Scala, Java, and Python.
- Adopt and promote best practices in operations, including observability, logging, reporting, and on-call processes to ensure engineering excellence.
Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is$100,000 - $464,000.