Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem
Charles Manski,
Michael Gmeiner and
Anat Tamburc
Papers from arXiv.org
Abstract:
Researchers regularly perform conditional prediction using imputed values of missing data. However, applications of imputation often lack a firm foundation in statistical theory. This paper originated when we were unable to find analysis substantiating claims that imputation of missing data has good frequentist properties when data are missing at random (MAR). We focused on the use of observed covariates to impute missing covariates when estimating conditional means of the form E(y|x, w). Here y is an outcome whose realizations are always observed, x is a covariate whose realizations are always observed, and w is a covariate whose realizations are sometimes unobserved. We examine the probability limit of simple imputation estimates of E(y|x, w) as sample size goes to infinity. We find that these estimates are not consistent when covariate data are MAR. To the contrary, the estimates suffer from a shrinkage problem. They converge to points intermediate between the conditional mean of interest, E(y|x, w), and the mean E(y|x) that conditions only on x. We use a type of genotype imputation to illustrate.
Date: 2021-02
New Economics Papers: this item is included in nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://arxiv.org/pdf/2102.11334 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2102.11334
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().