Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation

Li, Xiu; Henriksson, Aron; Duneld, Martin; Nouri, Jalal; Wu, Yongchao

Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation

Xiu Li, Aron Henriksson (), Martin Duneld, Jalal Nouri and Yongchao Wu
Additional contact information
Xiu Li: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Aron Henriksson: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Martin Duneld: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Jalal Nouri: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Yongchao Wu: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden

Future Internet, 2023, vol. 16, issue 1, 1-21

Abstract: Educational content recommendation is a cornerstone of AI-enhanced learning. In particular, to facilitate navigating the diverse learning resources available on learning platforms, methods are needed for automatically linking learning materials, e.g., in order to recommend textbook content based on exercises. Such methods are typically based on semantic textual similarity (STS) and the use of embeddings for text representation. However, it remains unclear what types of embeddings should be used for this task. In this study, we carry out an extensive empirical evaluation of embeddings derived from three different types of models: (i) static embeddings trained using a concept-based knowledge graph, (ii) contextual embeddings from a pre-trained language model, and (iii) contextual embeddings from a large language model (LLM). In addition to evaluating the models individually, various ensembles are explored based on different strategies for combining two models in an early vs. late fusion fashion. The evaluation is carried out using digital textbooks in Swedish for three different subjects and two types of exercises. The results show that using contextual embeddings from an LLM leads to superior performance compared to the other models, and that there is no significant improvement when combining these with static embeddings trained using a knowledge graph. When using embeddings derived from a smaller language model, however, it helps to combine them with knowledge graph embeddings. The performance of the best-performing model is high for both types of exercises, resulting in a mean Recall@3 of 0.96 and 0.95 and a mean MRR of 0.87 and 0.86 for quizzes and study questions, respectively, demonstrating the feasibility of using STS based on text embeddings for educational content recommendation. The ability to link digital learning materials in an unsupervised manner—relying only on readily available pre-trained models—facilitates the development of AI-enhanced learning.

Keywords: educational content recommendation; AI-enhanced learning; pre-trained language models; ensemble embeddings; knowledge graph embeddings; text similarity; textual semantic search; natural language processing (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1999-5903/16/1/12/pdf (application/pdf)
https://www.mdpi.com/1999-5903/16/1/12/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:16:y:2023:i:1:p:12-:d:1309732

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().