A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014
He Yulei (),
Shimizu Iris (),
Schappert Susan (),
Xu Jianmin (),
Beresovsky Vladislav (),
Khan Diba (),
Valverde Roberto () and
Schenker Nathaniel ()
Additional contact information
He Yulei: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Shimizu Iris: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Schappert Susan: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Xu Jianmin: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Beresovsky Vladislav: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Khan Diba: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Valverde Roberto: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Schenker Nathaniel: National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.
Journal of Official Statistics, 2016, vol. 32, issue 1, 147-164
Abstract:
Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014) compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.
Keywords: Bayesian; complex survey design; data release; exploratory data analysis; fraction of missing information; missing data (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/jos-2016-0007 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:offsta:v:32:y:2016:i:1:p:147-164:n:7
DOI: 10.1515/jos-2016-0007
Access Statistics for this article
Journal of Official Statistics is currently edited by Annica Isaksson and Ingegerd Jansson
More articles in Journal of Official Statistics from Sciendo
Bibliographic data for series maintained by Peter Golla ().