EconPapers    
Economics at your fingertips  
 

Assessing Bias in LLM-Generated Synthetic Datasets: The Case of German Voter Behavior

Leah von der Heyde, Anna-Carolina Haensch and Alexander Wenz
Additional contact information
Leah von der Heyde: LMU Munich
Alexander Wenz: University of Mannheim

No 97r8s, SocArXiv from Center for Open Science

Abstract: The rise of large language models (LLMs) like GPT-3 has sparked interest in their potential for creating synthetic datasets, particularly in the realm of privacy research. This study critically evaluates the use of LLMs in generating synthetic public opinion data, pointing out the biases inherent in the data generation process. While LLMs, trained on vast internet datasets, can mimic societal attitudes and behaviors, their application in synthesizing data poses significant privacy and accuracy challenges. We investigate these issues using the case of vote choice prediction in the 2017 German federal elections. Employing GPT-3, we construct synthetic personas based on the German Longitudinal Election Study, prompting the LLM to predict voting behavior. Our analysis compares these LLM-generated predictions with actual survey data, focusing on the implications of using such synthetic data and the biases it may contain. The results demonstrate GPT-3’s propensity to inaccurately predict voter choices, with biases favoring certain political groups and more predictable voter profiles. This outcome raises critical questions about the reliability and ethical use of LLMs in generating synthetic data.

Date: 2023-12-01
New Economics Papers: this item is included in nep-ain, nep-cmp and nep-pol
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://osf.io/download/6565abce932b9f449c76064f/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:97r8s

DOI: 10.31219/osf.io/97r8s

Access Statistics for this paper

More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-03-19
Handle: RePEc:osf:socarx:97r8s