A primer for the use of classifier and generative large language models in social science research
Joshua Cova and
Luuk Schmitz
No r3qng, OSF Preprints from Center for Open Science
Abstract:
The emergence of generative AI models is rapidly changing the social sciences. Much has now been written on the ethics and epistemological considerations of using these tools. Meanwhile, AI-powered research increasingly makes its way to preprint servers. However, we see a gap between ethics and practice: while many researchers would like to use these tools, few if any guides on how to do so exist. This paper fills this gap by providing users with a hands-on application written in accessible language. The paper deals with what we consider the most likely and advanced use case for AI in the social sciences: text annotation and classification. Our application guides readers through setting up a text classification pipeline and evaluating the results. The most important considerations concern reproducibility and transparency, open-source versus closed-source models, as well as the difference between classifier and generative models. The take-home message is this: these models provide unprecedented scale to augment research, but the community must take seriousely open-source and locally deployable models in the interest of open science principles. Our code to reproduce the example can be accessed via Github.
Date: 2024-12-20
New Economics Papers: this item is included in nep-ain, nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/6764b5f734e328c181af0fed/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:r3qng
DOI: 10.31219/osf.io/r3qng
Access Statistics for this paper
More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().