Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models
Aura Cristina Udrea,
Stefan Ruseti,
Vlad Pojoga,
Stefan Baghiu,
Andrei Terian and
Mihai Dascalu ()
Additional contact information
Aura Cristina Udrea: Faculty of Automatic Control and Computers, National University of Science and Technology POLITEHNICA Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania
Stefan Ruseti: Faculty of Automatic Control and Computers, National University of Science and Technology POLITEHNICA Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania
Vlad Pojoga: Faculty of Letters and Arts, Lucian Blaga University of Sibiu, Bulevardul Victoriei 10, 550024 Sibiu, Romania
Stefan Baghiu: Faculty of Letters and Arts, Lucian Blaga University of Sibiu, Bulevardul Victoriei 10, 550024 Sibiu, Romania
Andrei Terian: Faculty of Letters and Arts, Lucian Blaga University of Sibiu, Bulevardul Victoriei 10, 550024 Sibiu, Romania
Mihai Dascalu: Faculty of Automatic Control and Computers, National University of Science and Technology POLITEHNICA Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania
Future Internet, 2025, vol. 17, issue 9, 1-28
Abstract:
Recent developments in natural language processing, particularly large language models (LLMs), create new opportunities for literary analysis in underexplored languages like Romanian. This study investigates stylistic heterogeneity and genre blending in 175 late 19th- and early 20th-century Romanian novels, each classified by literary historians into one of 17 genres. Our findings reveal that most novels do not adhere to a single genre label but instead combine elements of multiple (micro)genres, challenging traditional single-label classification approaches. We employed a dual computational methodology combining an analysis with Romanian-tailored linguistic features with general-purpose LLMs. ReaderBench, a Romanian-specific framework, was utilized to extract surface, syntactic, semantic, and discourse features, capturing fine-grained linguistic patterns. Alternatively, we prompted two LLMs (Llama3.3 70B and DeepSeek-R1 70B) to predict genres at the paragraph level, leveraging their ability to detect contextual and thematic coherence across multiple narrative scales. Statistical analyses using Kruskal–Wallis and Mann–Whitney tests identified genre-defining features at both novel and chapter levels. The integration of these complementary approaches enhances microgenre detection beyond traditional classification capabilities. ReaderBench provides quantifiable linguistic evidence, while LLMs capture broader contextual patterns; together, they provide a multi-layered perspective on literary genre that reflects the complex and heterogeneous character of fictional texts. Our results argue that both language-specific and general-purpose computational tools can effectively detect stylistic diversity in Romanian fiction, opening new avenues for computational literary analysis in limited-resourced languages.
Keywords: natural language processing; literary microgenres; large language models (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/17/9/397/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/9/397/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:9:p:397-:d:1738235
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().