A Readability-Driven Curriculum Learning Method for Data-Efficient Small Language Model Pretraining
Suyun Kim,
Jungwon Park and
Juae Kim ()
Additional contact information
Suyun Kim: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Republic of Korea
Jungwon Park: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Republic of Korea
Juae Kim: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Republic of Korea
Mathematics, 2025, vol. 13, issue 20, 1-22
Abstract:
Large language models demand substantial computational and data resources, motivating approaches that improve the training efficiency of small language models. While curriculum learning methods based on linguistic difficulty measures have been explored as a potential solution, prior approaches that rely on complex linguistic indices are often computationally expensive, difficult to interpret, or fail to yield consistent improvements. Moreover, existing methods rarely incorporate the cognitive and linguistic efficiency observed in human language acquisition. To address these gaps, we propose a readability-driven curriculum learning method based on the Flesch Reading Ease (FRE) score, which provides a simple, interpretable, and cognitively motivated measure of text difficulty. Across two dataset configurations and multiple curriculum granularities, our method yields consistent improvements over baseline models without curriculum learning, achieving substantial gains on BLiMP and MNLI. Reading behavior evaluations also reveal human-like sensitivity to textual difficulty. These findings demonstrate that a lightweight, interpretable curriculum design can enhance small language models under strict data constraints, offering a practical path toward more efficient training.
Keywords: curriculum learning; pre-training; Flesch Reading Ease score; language model (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/20/3300/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/20/3300/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:20:p:3300-:d:1772436
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().