EconPapers    
Economics at your fingertips  
 

Supplementing CEFR-graded vocabulary lists for language learners by leveraging information on dictionary views, corpus frequency, part-of-speech, and polysemy

Sascha Wolfer () and Robert Lew
Additional contact information
Sascha Wolfer: Leibniz Institute for the German Language (IDS)
Robert Lew: Adam Mickiewicz University

Palgrave Communications, 2025, vol. 12, issue 1, 1-11

Abstract: Abstract The study explores an approach to supplementing existing CEFR-graded vocabulary lists, which are often incomplete, by imputing CEFR levels for additional vocabulary items. This is achieved by analysing word-level data such as dictionary views, corpus frequency, part-of-speech, and polysemy. Using English as a test case, the study employs a variety of machine-learning models to predict CEFR levels for words not included in the initial set. The models significantly outperform a random baseline, indicating their effectiveness. The findings suggest that corpus frequency is the most influential predictor, followed by dictionary views and polysemy. The study reveals the potential of this semi-automatic approach to expand CEFR-graded word lists, making them more comprehensive and accessible for language learners. At the same time, human oversight is recommended to ensure the appropriateness of the imputed words for language learners, such as regarding the inclusion of potentially offensive terms. Future research may extend this methodology to other languages, provided that sufficient linguistic data is available.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1057/s41599-025-05446-y Abstract (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-05446-y

Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about

DOI: 10.1057/s41599-025-05446-y

Access Statistics for this article

More articles in Palgrave Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-07-23
Handle: RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-05446-y