GPT Models for Text-Annotation: An Empirical Exploration in Public Policy Research
Alexander Churchill,
Shamitha Pichika,
Chengxin Xu and
Ying Liu
Additional contact information
Chengxin Xu: Seattle University
Ying Liu: Rutgers University
No 6fpgj_v1, SocArXiv from Center for Open Science
Abstract:
Text annotation, the practice of labeling text following a predetermined scheme, is essential to qualitative researcher in public policy. Despite its importance utility, text annotation for policy research faces challenges of high labor and time costs, particularly when the size of the qualitative data is enormous. Recent Developments in large language models (LLMs), specifically models with generative pre-trained transformers (GPTs), shows a potential approach that may alleviate the burden of manual annotation coding. In this report, we test if Open AI’s GPT3.5 and GPT-4 models can be employed for text annotation tasks and measure the results of different prompting strategies against manual annotation. Using email messages collected from a national corresponding experiment in the U.S. nursing home market as an example, on average, we demonstrate 86.25% percentage agreement between GPT and human annotations. We also show that GPT models possess context-based limitations. Our report ends with suggestions, guidance, and reflections for readers who are interested in using GPT models for text annotation.
Date: 2024-01-25
New Economics Papers: this item is included in nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/65b191e7b1f2b5065eb0eb38/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:6fpgj_v1
DOI: 10.31219/osf.io/6fpgj_v1
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().