Quality aspects of annotated data
Jacob Beck ()
Additional contact information
Jacob Beck: Ludwig-Maximilians-University Munich
AStA Wirtschafts- und Sozialstatistisches Archiv, 2023, vol. 17, issue 3, No 6, 353 pages
Abstract:
Abstract The quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application. Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data. For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations. Decisions about the selection of annotators or label options may affect training data quality and model performance. In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data. I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection. The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction. I conclude by illustrating the consequences for future research and applications of data annotation. The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.
Keywords: Data quality; Data annotation; Training data; Human annotation; Research synthesis (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11943-023-00332-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:astaws:v:17:y:2023:i:3:d:10.1007_s11943-023-00332-y
Ordering information: This journal article can be ordered from
http://www.springer. ... ce/journal/11943/PS2
DOI: 10.1007/s11943-023-00332-y
Access Statistics for this article
AStA Wirtschafts- und Sozialstatistisches Archiv is currently edited by Ralf Münnich
More articles in AStA Wirtschafts- und Sozialstatistisches Archiv from Springer, Deutsche Statistische Gesellschaft - German Statistical Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().