Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations
Nicholas Tierney () and
Dianne Cook ()
No 14/18, Monash Econometrics and Business Statistics Working Papers from Monash University, Department of Econometrics and Business Statistics
Abstract:
Despite the large body of research on missing value distributions and imputation, there is comparatively little literature on how to make it easy to handle, explore, and impute missing values in data. This paper addresses this gap. The new methodology builds upon tidy data principles, with a goal to integrating missing value handling as an integral part of data analysis workflows. New data structures are defined along with new functions (verbs) to perform common operations. Together these provide a cohesive framework for handling, exploring, and imputing missing values. These methods have been made available in the R package naniar.
Keywords: workflow; statistical computing; data science; data visualization; tidyverse; data pipeline. (search for similar items in EconPapers)
JEL-codes: C10 C14 C22 (search for similar items in EconPapers)
Pages: 41
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.monash.edu/business/ebs/research/publications/ebs/wp14-2018.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:msh:ebswps:2018-14
Ordering information: This working paper can be ordered from
http://business.mona ... -business-statistics
Access Statistics for this paper
More papers in Monash Econometrics and Business Statistics Working Papers from Monash University, Department of Econometrics and Business Statistics PO Box 11E, Monash University, Victoria 3800, Australia. Contact information at EDIRC.
Bibliographic data for series maintained by Professor Xibin Zhang ().