Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

Manski, Charles; Gmeiner, Michael; Tamburc, Anat

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

Charles Manski, Michael Gmeiner and Anat Tamburc

Abstract: Researchers regularly perform conditional prediction using imputed values of missing data. However, applications of imputation often lack a firm foundation in statistical theory. This paper originated when we were unable to find analysis substantiating claims that imputation of missing data has good frequentist properties when data are missing at random (MAR). We focused on the use of observed covariates to impute missing covariates when estimating conditional means of the form E(y|x, w). Here y is an outcome whose realizations are always observed, x is a covariate whose realizations are always observed, and w is a covariate whose realizations are sometimes unobserved. We examine the probability limit of simple imputation estimates of E(y|x, w) as sample size goes to infinity. We find that these estimates are not consistent when covariate data are MAR. To the contrary, the estimates suffer from a shrinkage problem. They converge to points intermediate between the conditional mean of interest, E(y|x, w), and the mean E(y|x) that conditions only on x. We use a type of genotype imputation to illustrate.

Date: 2021-02
New Economics Papers: this item is included in nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://arxiv.org/pdf/2102.11334 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2102.11334

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().