המקום בו המומחים והחברות הטובות ביותר נפגשים
Data Engineer: ROAD
Job description
Primary Job Responsibilities
Design systems, integrations and processes required to achieve the best fine tuning results, including selection and integration of data sources, data pre-processing and subsequent quality evaluation.
Design, build, and maintain scalable data pipelines for extracting, transforming, and loading (ETL) data from internal Red Hat systems into LLM training process
Develop and optimize databases to ensure efficient data storage and retrieval.
Design and develop data warehousing solution to support large scale data storage.
Utilize Python for data manipulation, automation, and analysis. Ensure high quality data is used as an input for model fine tuning and RAG building.
Contribute to the entire stack, from active participation in the fine tuning process to the implementation of and ongoing optimization of the designed systems
Collaborate with other team members (data scientists, software engineers) as well as other teams to deliver a best-in-class solution and maintain it.
Work in a fast-paced agile globally distributed environment of talented engineers
Required Skills
Bachelor’s degree in Computer Science, Data Science, or a related field
2-4 years of work experience in data engineering, preferably in AI/ML contexts
Extensive, advanced experience with Python development.
Strong understanding of LLM architectures, training processes, and data requirements
Experience with RAG systems, knowledge base construction, and vector databases
Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts
Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)
Strong self-motivation, problem solving and organizational skills.
Collaborative attitude and willingness to share ideas openly.
Excellent English written and verbal communication skills.
Ability to quickly learn and use new tools and technologies
Preferred skills
Experience with AI and Machine Learning platforms, tools, and frameworks, such as: Tensorflow, PyTorch, LLaMA.cpp, and Kubeflow.
Familiarity with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.
Experience with various vector store technologies and their applications in AI
Experience with Cloud Native Technologies and Platforms (e.g. Kubernetes)
Understanding of data lakehouse concepts and architectures
Experience with agile development, CI/CD systems and DevOps methodology
Experience with big data storage techniques, such as Parquet, Avro, and S3.
משרות נוספות שיכולות לעניין אותך