Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

SAP Internship Data Scientist F/M 
France 
444998666

06.02.2025

What you'll do

Goals:

  • Apply Natural Language Processing (NLP) techniques to analyze and structure the subset of SAP KBAs and SAP Help Documents on SAP HANA DB.
  • Contribute to the development of a knowledge graph inspired by LightRAG, utilizing NER and relation extraction.
  • Topic Modeling and Clustering:
    • Cluster KBAs into groups using various clustering algorithms (e.g., K-means, hierarchical clustering).
    • Apply topic modeling techniques (e.g., Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), BERTopic, FASTopic) to identify underlying topics in the clustered KBAs.
    • Conduct analysis on the clusters and summarize elements within each cluster using Large Language Models (LLMs) such as BERT, RoBERTa, or transformer-based models.
  • Named Entity Recognition (NER) and Information Extraction:
    • Apply NER techniques to extract relevant entities (e.g., product names, functions, references) from the KBAs and Help Documents.
    • Use information extraction techniques to structure the documents into a more organized format.
    • Identify relationships between documents based on meta-data presented in JSON (e.g., related products, product functions).
  • Construction of Knowledge Graph:
    • Design and implement a knowledge graph inspired by LightRAG, utilizing the extracted entities and relationships.
    • Integrate the knowledge graph with existing SAP systems or tools, as needed.

Deliverables:

  • A report detailing the clustering and topic modeling results.
  • A knowledge graph inspired by LightRAG, integrating the extracted entities and relationships from the KBAs and Help Documents.
  • A presentation summarizing the project outcomes and contributions to the SAP Labs team.
  • A written document (e.g., thesis, research paper) detailing the methodology, results, and conclusions of the internship project.

What you bring

NLP Fundamentals:

  • Understanding of NLP concepts, such as tokenization, stemming, lemmatization, sentiment analysis, and topic modeling.
  • Familiarity with popular NLP libraries and frameworks (e.g., NLTK, spaCy, scikit-learn).

Deep Learning Techniques:

  • Experience with deep learning models for NLP tasks, such as BERT, RoBERTa, transformer-based models.
  • Understanding of techniques for fine-tuning pre-trained language models on specific tasks.

Programming Skills:

  • Proficiency in programming languages (e.g., Python, Java).
  • Familiarity with popular libraries and frameworks (e.g., TensorFlow, PyTorch).

Data Analysis and Visualization:

  • Experience with data analysis and visualization tools (e.g., Pandas, NumPy, Matplotlib, Seaborn).

Collaboration and Communication:

  • Strong communication and collaboration skills to work effectively with the SAP Labs team.
  • Desire and ability to work closely with a team (local and distant); Good written and oral practice of English, French, and German are a plus.
  • Ability to think innovatively and autonomy in work.

Meet your team

  • You will work closely with the SAP Labs team, which consists of experienced and passionate professionals in the field of information technology and software.
  • This team will provide the necessary support and enable you to thrive in an innovative and international environment.
  • We are confident that the missions of this internship are both complex and international, as we are a leading global company in the field of information technology and software.