A Comprehensive Simulation Study on the Forward Imputation
Nadia Solaro (),
Alessandro Barbiero,
Giancarlo Manzi and
Pier Alda Ferrari
Departmental Working Papers from Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano
Abstract:
The Nearest Neighbour Imputation (NNI) method has a long history in missing data imputation. Likewise, multivariate dimensional reduction techniques allow for preserving the maximum information from the data. Recently, the combined use of these methodologies has been proposed to solve data imputation problems and exploit as much as information from the complete part of the data. In this paper we perform an extensive simulation study to test the performance of this new imputation approach (called “Forward Imputation” - ForImp). We compare the two ForImp methods developed for missing quantitative data (the first one called ForImpPCA involving the NNI method and the Principal Component Analysis (PCA) as a multivariate data analysis technique, and the second one called ForImpMahalanobis, which involves the Mahalanobis distance for NNI) with other two imputation techniques regarded as benchmark, namely Stekhoven and Bühlmann’s missForest method, which is a nonparametric imputation technique for continuous and/or categorical data based on a random forest, and the Iterative PCA, which is an algorithmic-type technique that imputes missing values simultaneously by an iterative use of PCA. The simulation study is based on constructing simulated data with different levels of kurtosis or skewness and strength of linear relationship of variables, so that the performance of the four methods can be compared on various data patterns. Distributions used for these simulated data belong to the families of Multivariate Exponential Power and Multivariate Skew-Normal distributions, respectively. Results tend to favour ForImpMahalanobis especially in the presence of skew data with small or negative correlations of a same magnitude, or a mix of negative and positive correlations of low level, whereas ForImpPCA works better than it when a slightly higher level of correlations is present in the data.
Keywords: Correlation; Data patterns; Kurtosis; Mahalanobis distance; MissForest; Nearest Neighbour Imputation; Principal Component Analysis; Skewness (search for similar items in EconPapers)
JEL-codes: C15 C18 C38 C63 (search for similar items in EconPapers)
Date: 2015-02-23
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://wp.demm.unimi.it/files/wp/2015/DEMM-2015_04wp.pdf (application/pdf)
http://wp.demm.unimi.it/files/wp/2015/DEMM-2015_04wp_ESM.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:mil:wpdepa:2015-04
Access Statistics for this paper
More papers in Departmental Working Papers from Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano Via Conservatorio 7, I-20122 Milan - Italy. Contact information at EDIRC.
Bibliographic data for series maintained by DEMM Working Papers ( this e-mail address is bad, please contact ).