Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning

Alizadeh, Meysam; Kubli, Maël; Samei, Zeynab; Dehghani, Shirin; Zahedivafa, Mohammadmasiha; Bermeo, Juan D.; Korobeynikova, Maria; Gilardi, Fabrizio

Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning

Meysam Alizadeh (), Maël Kubli, Zeynab Samei, Shirin Dehghani, Mohammadmasiha Zahedivafa, Juan D. Bermeo, Maria Korobeynikova and Fabrizio Gilardi
Additional contact information
Meysam Alizadeh: University of Zurich
Maël Kubli: University of Zurich
Zeynab Samei: Institute for Fundamental Research
Shirin Dehghani: Allameh Tabataba’i University
Mohammadmasiha Zahedivafa: Iran University of Science and Technology
Juan D. Bermeo: University of Zurich
Maria Korobeynikova: University of Zurich
Fabrizio Gilardi: University of Zurich

Journal of Computational Social Science, 2025, vol. 8, issue 1, No 17, 25 pages

Abstract: Abstract This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis and to establish a baseline performance benchmark that demonstrates the models’ effectiveness. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT $$-$$ - 3.5 and GPT-4, though still lagging behind fine-tuned GPT $$-$$ - 3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.

Keywords: ChatGPT; LLMs; Open source; FLAN; LLaMA; NLP; Text annotation (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s42001-024-00345-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jcsosc:v:8:y:2025:i:1:d:10.1007_s42001-024-00345-9

Ordering information: This journal article can be ordered from
http://www.springer. ... iences/journal/42001

DOI: 10.1007/s42001-024-00345-9

Access Statistics for this article

Journal of Computational Social Science is currently edited by Takashi Kamihigashi

More articles in Journal of Computational Social Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().