Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

MongoDB Lead Engineer Inference Platform 
United States, California, Palo Alto 
870240681

Today

As a Lead Engineer, Inference Platform, you’ll be hands-on with design and implementation, while working with engineers across experience levels to build a robust, scalable system. The focus is on latency, availability, observability, and scalability in a multi-tenant, cloud-native environment. You will also be responsible for guiding the technical direction of the team, mentoring junior engineers, and ensuring the delivery of high-quality, impactful features.

What You’ll Do
  • Partner with Search Platform and Voyage.ai AI engineers and researchers to productionize state-of-the-art embedding models and rerankers, supporting both batch and real-time inference
  • Lead key projects around performance optimization, GPU utilization, autoscaling, and observability for the inference platform
  • Design and build components of a multi-tenant inference service that integrates with Atlas Vector Search, driving capabilities for semantic search and hybrid retrieval
  • Contribute to platform features like model versioning, safe deployment pipelines, latency-aware routing, and model health monitoring
  • Collaborate with peers across ML, infra, and product teams to define architectural patterns and operational practices that support high availability and low latency at scale
  • Guide decisions on model serving architecture using tools like vLLM, ONNX Runtime, and container orchestration in Kubernetes
  • Provide technical leadership and mentorship to junior engineers, fostering a culture of technical excellence and continuous improvement within the team
Who You Are
  • 8+ years of engineering experience in backend systems, ML infrastructure, or scalable platform development, and the ability to provide technical leadership and guidance to a team of engineers
  • Expertise in serving embedding models in production environments
  • Strong systems skills in languages like Go, Rust, C++, or Python, and experience profiling and optimizing performance
  • Comfortable working on cloud-native distributed systems, with a focus on latency, availability, and observability
  • Familiarity with inference runtimes and vector search systems (e.g., Faiss, HNSW, ScaNN)
  • Proven ability to collaborate across disciplines and experience levels, from ML researchers to junior engineers
  • Experience with high-scale SaaS infrastructure, particularly in multi-tenant environments
  • 1+ years of experience serving as TL for a large-scale ML inference or training platform SW project
Nice to Have
  • Prior experience working with model teams on inference-optimized architectures
  • Background in hybrid retrieval, prompt-based pipelines, or retrieval-augmented generation (RAG)
  • Contributions to relevant open-source ML serving infrastructure
  • 1+ years of experience in managing a technical team focused on ML inference or training infrastructure
Why Join Us
  • Be part of shaping the future of AI-native developer experiences on the world’s most popular developer data platform
  • Collaborate with ML experts from Voyage.ai to bring cutting-edge research into production at scale
  • Solve hard problems in real-time inference, model serving, and semantic retrieval — in a system used by thousands of customers worldwide
  • Work in a culture that values mentorship, autonomy, and strong technical craft
  • Competitive compensation, equity, and career growth in a hands-on technical leadership role
$270,000 USD