An Empirical Comparison of Multiple Imputation Methods for Categorical Data
Olanrewaju Akande,
Fan Li and
Jerome Reiter
The American Statistician, 2017, vol. 71, issue 2, 162-170
Abstract:
Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.
Date: 2017
References: View complete reference list from CitEc
Citations: View citations in EconPapers (8)
Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2016.1277158 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:71:y:2017:i:2:p:162-170
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20
DOI: 10.1080/00031305.2016.1277158
Access Statistics for this article
The American Statistician is currently edited by Eric Sampson
More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().