EconPapers    
Economics at your fingertips  
 

A Readability-Driven Curriculum Learning Method for Data-Efficient Small Language Model Pretraining

Suyun Kim, Jungwon Park and Juae Kim ()
Additional contact information
Suyun Kim: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Republic of Korea
Jungwon Park: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Republic of Korea
Juae Kim: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Republic of Korea

Mathematics, 2025, vol. 13, issue 20, 1-22

Abstract: Large language models demand substantial computational and data resources, motivating approaches that improve the training efficiency of small language models. While curriculum learning methods based on linguistic difficulty measures have been explored as a potential solution, prior approaches that rely on complex linguistic indices are often computationally expensive, difficult to interpret, or fail to yield consistent improvements. Moreover, existing methods rarely incorporate the cognitive and linguistic efficiency observed in human language acquisition. To address these gaps, we propose a readability-driven curriculum learning method based on the Flesch Reading Ease (FRE) score, which provides a simple, interpretable, and cognitively motivated measure of text difficulty. Across two dataset configurations and multiple curriculum granularities, our method yields consistent improvements over baseline models without curriculum learning, achieving substantial gains on BLiMP and MNLI. Reading behavior evaluations also reveal human-like sensitivity to textual difficulty. These findings demonstrate that a lightweight, interpretable curriculum design can enhance small language models under strict data constraints, offering a practical path toward more efficient training.

Keywords: curriculum learning; pre-training; Flesch Reading Ease score; language model (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/20/3300/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/20/3300/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:20:p:3300-:d:1772436

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-11-15
Handle: RePEc:gam:jmathe:v:13:y:2025:i:20:p:3300-:d:1772436