EconPapers    
Economics at your fingertips  
 

Munging the Ghosts in the Machine: Coded Bias and the Craft of Wrangling Archival Data

Vincent Yung and Jeannette Colyvas
Additional contact information
Vincent Yung: Northwestern University

No 2dve6, SocArXiv from Center for Open Science

Abstract: Data wrangling is typically treated as an obligatory, codified, and ideally automated step in the machine learning (ML) pipeline. In contrast, we suggest that archival data wrangling is a theory-driven process best understood as a practical craft. Drawing on empirical examples from contemporary computational social science, we identify nine core modes of data wrangling, which can be seen as a sequence but are iterative and nonlinear in practice. Moreover, we discuss how data wrangling can address issues of algorithmic bias. While ML has shifted the focus towards architectural engineering, we assert that to properly engage with machine learning is to properly engage with data wrangling.

Date: 2023-08-18
New Economics Papers: this item is included in nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://osf.io/download/64dbe63158c3ca083fe4778e/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:2dve6

DOI: 10.31219/osf.io/2dve6

Access Statistics for this paper

More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-03-19
Handle: RePEc:osf:socarx:2dve6