Assessing Bias in LLM-Generated Synthetic Datasets: The Case of German Voter Behavior
Leah von der Heyde,
Anna-Carolina Haensch and
Alexander Wenz
Additional contact information
Leah von der Heyde: LMU Munich
Alexander Wenz: University of Mannheim
No 97r8s, SocArXiv from Center for Open Science
Abstract:
The rise of large language models (LLMs) like GPT-3 has sparked interest in their potential for creating synthetic datasets, particularly in the realm of privacy research. This study critically evaluates the use of LLMs in generating synthetic public opinion data, pointing out the biases inherent in the data generation process. While LLMs, trained on vast internet datasets, can mimic societal attitudes and behaviors, their application in synthesizing data poses significant privacy and accuracy challenges. We investigate these issues using the case of vote choice prediction in the 2017 German federal elections. Employing GPT-3, we construct synthetic personas based on the German Longitudinal Election Study, prompting the LLM to predict voting behavior. Our analysis compares these LLM-generated predictions with actual survey data, focusing on the implications of using such synthetic data and the biases it may contain. The results demonstrate GPT-3’s propensity to inaccurately predict voter choices, with biases favoring certain political groups and more predictable voter profiles. This outcome raises critical questions about the reliability and ethical use of LLMs in generating synthetic data.
Date: 2023-12-01
New Economics Papers: this item is included in nep-ain, nep-cmp and nep-pol
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/6565abce932b9f449c76064f/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:97r8s
DOI: 10.31219/osf.io/97r8s
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().