EconPapers    
Economics at your fingertips  
 

Using Large Language Models for Text Annotation in Social Science and Humanities: A Hands-On Python/R Tutorial

Qixiang Fang, Javier Garcia-Bernardo and Erik-Jan van Kesteren

No v4eq6_v1, SocArXiv from Center for Open Science

Abstract: Large language models (LLMs) have become an essential tool for social scientists and humanities (SSH) researchers who work with textual data. One particularly valuable use case is automating text annotation, traditionally a time-consuming step in preparing data for empirical analysis. Yet, many SSH researchers face two challenges: getting started with LLMs, and understanding how to evaluate and correct for their limitations. The rapid pace of model development can make LLMs appear inaccessible or intimidating, while even experienced users may overlook how annotation errors can bias results from downstream analyses (e.g., regression estimates, $p$-values), even when accuracy appears high. This tutorial provides a step-by-step, hands-on guide to using LLMs for text annotation in SSH research for both Python and R users. We cover (1) how to choose and access LLM APIs, (2) how to design and run annotation tasks programmatically, (3) how to evaluate annotation quality and iterate on prompts, (4) how to integrate annotations into statistical workflows while accounting for uncertainty, and (5) how to manage cost, efficiency, and reproducibility. Throughout, we provide concrete examples, code snippets, and best-practice checklists to help researchers confidently and transparently incorporate LLM-based annotation into their workflows.

Date: 2025-11-13
References: Add references at CitEc
Citations:

Downloads: (external link)
https://osf.io/download/6914b4a59613fab946e2f591/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:v4eq6_v1

DOI: 10.31219/osf.io/v4eq6_v1

Access Statistics for this paper

More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-11-16
Handle: RePEc:osf:socarx:v4eq6_v1