Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning
Meysam Alizadeh (),
Maël Kubli,
Zeynab Samei,
Shirin Dehghani,
Mohammadmasiha Zahedivafa,
Juan D. Bermeo,
Maria Korobeynikova and
Fabrizio Gilardi
Additional contact information
Meysam Alizadeh: University of Zurich
Maël Kubli: University of Zurich
Zeynab Samei: Institute for Fundamental Research
Shirin Dehghani: Allameh Tabataba’i University
Mohammadmasiha Zahedivafa: Iran University of Science and Technology
Juan D. Bermeo: University of Zurich
Maria Korobeynikova: University of Zurich
Fabrizio Gilardi: University of Zurich
Journal of Computational Social Science, 2025, vol. 8, issue 1, No 17, 25 pages
Abstract:
Abstract This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis and to establish a baseline performance benchmark that demonstrates the models’ effectiveness. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT $$-$$ - 3.5 and GPT-4, though still lagging behind fine-tuned GPT $$-$$ - 3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.
Keywords: ChatGPT; LLMs; Open source; FLAN; LLaMA; NLP; Text annotation (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s42001-024-00345-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jcsosc:v:8:y:2025:i:1:d:10.1007_s42001-024-00345-9
Ordering information: This journal article can be ordered from
http://www.springer. ... iences/journal/42001
DOI: 10.1007/s42001-024-00345-9
Access Statistics for this article
Journal of Computational Social Science is currently edited by Takashi Kamihigashi
More articles in Journal of Computational Social Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().