Munging the Ghosts in the Machine: Coded Bias and the Craft of Wrangling Archival Data
Vincent Yung and
Jeannette Colyvas
Additional contact information
Vincent Yung: Northwestern University
No 2dve6, SocArXiv from Center for Open Science
Abstract:
Data wrangling is typically treated as an obligatory, codified, and ideally automated step in the machine learning (ML) pipeline. In contrast, we suggest that archival data wrangling is a theory-driven process best understood as a practical craft. Drawing on empirical examples from contemporary computational social science, we identify nine core modes of data wrangling, which can be seen as a sequence but are iterative and nonlinear in practice. Moreover, we discuss how data wrangling can address issues of algorithmic bias. While ML has shifted the focus towards architectural engineering, we assert that to properly engage with machine learning is to properly engage with data wrangling.
Date: 2023-08-18
New Economics Papers: this item is included in nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/64dbe63158c3ca083fe4778e/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:2dve6
DOI: 10.31219/osf.io/2dve6
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().