EconPapers    
Economics at your fingertips  
 

Wrangling Categorical Data in R

Amelia McNamara and Nicholas Horton

The American Statistician, 2018, vol. 72, issue 1, 97-104

Abstract: Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This article discusses common problems arising from categorical variable transformations in R, demonstrates the use of factors, and suggests approaches to address data wrangling challenges. For each problem, we present at least two strategies for management, one in base R and the other from the “tidyverse.” We consider several motivating examples, suggest defensive coding strategies, and outline principles for data wrangling to help ensure data quality and sound analysis. Supplementary materials for this article are available online.

Date: 2018
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2017.1356375 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:72:y:2018:i:1:p:97-104

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20

DOI: 10.1080/00031305.2017.1356375

Access Statistics for this article

The American Statistician is currently edited by Eric Sampson

More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-31
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:97-104