EconPapers    
Economics at your fingertips  
 

Impact of Data Collection on ML Models: Analyzing Differences of Biases Between Low-Versus High-Skilled Annotators

Johannes Schneider (), Daniel Eisenhardt (), Christian Utama () and Christian Meske ()
Additional contact information
Johannes Schneider: University of Liechtenstein
Daniel Eisenhardt: Ruhr-University Bochum
Christian Utama: Freie Universität Berlin
Christian Meske: Ruhr-University Bochum

A chapter in Solutions and Technologies for Responsible Digitalization, 2025, pp 65-80 from Springer

Abstract: Abstract Labeled data is crucial for the success of machine learning-based artificial intelligence. However, companies often face a choice between collecting few annotations from high- or low-skilled annotators, possibly exhibiting different biases. This study investigates differences in biases between datasets labeled by said annotator groups and their impact on machine learning models. Therefore, we created high- and low-skilled annotated datasets measured the contained biases through entropy and trained different machine learning models to examine bias inheritance effects. Our findings on text sentiment annotations show both groups exhibit a considerable amount of bias in their annotations, although there is a significant difference regarding the error types commonly encountered. Models trained on biased annotations produce significantly different predictions, indicating bias propagation and tend to make more extreme errors than humans. As partial mitigation, we propose and show the efficiency of a hybrid approach where data is labeled by low-skilled and high-skilled workers.

Keywords: Annotators; Machine learning models; Bias; Labeling (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:lnichp:978-3-031-80122-8_5

Ordering information: This item can be ordered from
http://www.springer.com/9783031801228

DOI: 10.1007/978-3-031-80122-8_5

Access Statistics for this chapter

More chapters in Lecture Notes in Information Systems and Organization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-02
Handle: RePEc:spr:lnichp:978-3-031-80122-8_5