We are seeking a Data Engineer I will support the GenAI-powered insights assistant by building pipelines that process unstructured data (knowledge articles and documents) in the S3 Data Lakehouse. You'll manage vector databases that store embeddings, helping the AI retrieve relevant info quickly and accurately.Key job responsibilities
- Develop metadata pipelines to tag documents with freshness, ownership, and other context for better filtering.
- Implement caching and multi-region replication to reduce query latency.
- Monitor data retrieval accuracy and log source citations to improve AI trustworthiness.
- Automate ingestion and embedding generation for unstructured data into vector databases like Zilliz, Pinecone, or OpenSearch.
- 1+ years of data engineering experience
- Experience with data modeling, warehousing and building ETL pipelines
- Experience with one or more query language (e.g., SQL, PL/SQL, DDL, MDX, HiveQL, SparkSQL, Scala)
- Experience with one or more scripting language (e.g., Python, KornShell)
- Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
- Experience with any ETL tool like, Informatica, ODI, SSIS, BODI, Datastage, etc.
- Strong expertise in AWS Glue, Redshift, Kinesis/MSK, Lambda.
- Hands-on with data contracts, lineage tracking, and automated QA.
- Familiarity with multi-modal data ingestion (structured + unstructured).
- Experience operationalizing cross-region replication and caching strategies.
משרות נוספות שיכולות לעניין אותך