Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

MongoDB Senior Data Engineer GenAI 
United States, New York, New York 
586110544

24.06.2024
What you’ll do:
  • Build ETL pipelines using technologies such as Python and Spark
  • Implement new ETL pipelines on top of a variety of architectures (e.g. file-based, streaming)
  • Determine best strategies for building AI tools, including how best to chunk and retrieve RAG-based data and which LLMs are most appropriate to support use cases
  • Stay abreast of industry trends in the AI space, and evaluate and incorporate new concepts/tools into MongoDB’s internal AI architecture
  • Make architectural decisions relating to storing large datasets using a variety of file formats (e.g. Parquet, JSON) and table types (e.g. Iceberg, Hive)
  • Work with Security and Compliance teams to ensure that datasets have appropriate permissions and regulations in place
  • Work with Data Analysts and Data Scientists to understand and make available the data that is important for their analysis
  • Work with our Data Platform, Architecture, and Governance sibling teams to make data scalable, consumable, and discoverable
We’re looking for someone with:
  • 5+ years of building ETL pipelines for a Data Lake/Warehouse
  • 1+ year building AI and RAG-based applications
  • 5+ years Python experience
  • 5+ years Spark experience
  • Hive, Iceberg, Glue, or other technologies that expose big data as tables
  • Familiarity with different big data file types such as Parquet, Avro, and JSON

Success Measures:

  • In 3 months, you'll have a thorough understanding of the architecture of MongoDB’s internal Data Lake and AI ecosystem
  • In 6 months, you'll have owned the delivery of a large project from start (scoping, design) to finish (delivery)
  • In 12 months, you'll have designed new features, led development work, and become a go-to expert on parts of the system
$231,000 USD