Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Red hat Data Engineer 
India, Karnataka, Bengaluru 
145807517

15.12.2024

Data Engineer: ROAD

Job description

Primary Job Responsibilities

  • Design systems, integrations and processes required to achieve the best fine tuning results, including selection and integration of data sources, data pre-processing and subsequent quality evaluation.

  • Design, build, and maintain scalable data pipelines for extracting, transforming, and loading (ETL) data from internal Red Hat systems into LLM training process

  • Develop and optimize databases to ensure efficient data storage and retrieval.

  • Design and develop data warehousing solution to support large scale data storage.

  • Utilize Python for data manipulation, automation, and analysis. Ensure high quality data is used as an input for model fine tuning and RAG building.

  • Contribute to the entire stack, from active participation in the fine tuning process to the implementation of and ongoing optimization of the designed systems

  • Collaborate with other team members (data scientists, software engineers) as well as other teams to deliver a best-in-class solution and maintain it.

  • Work in a fast-paced agile globally distributed environment of talented engineers

Required Skills

  • Bachelor’s degree in Computer Science, Data Science, or a related field

  • 2-4 years of work experience in data engineering, preferably in AI/ML contexts

  • Extensive, advanced experience with Python development.

  • Strong understanding of LLM architectures, training processes, and data requirements

  • Experience with RAG systems, knowledge base construction, and vector databases

  • Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts

  • Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)

  • Strong self-motivation, problem solving and organizational skills.

  • Collaborative attitude and willingness to share ideas openly.

  • Excellent English written and verbal communication skills.

  • Ability to quickly learn and use new tools and technologies

Preferred skills

  • Experience with AI and Machine Learning platforms, tools, and frameworks, such as: Tensorflow, PyTorch, LLaMA.cpp, and Kubeflow.

  • Familiarity with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.

  • Experience with various vector store technologies and their applications in AI

  • Experience with Cloud Native Technologies and Platforms (e.g. Kubernetes)

  • Understanding of data lakehouse concepts and architectures

  • Experience with agile development, CI/CD systems and DevOps methodology

  • Experience with big data storage techniques, such as Parquet, Avro, and S3.