Improved methods for the imputation of missing data by nearest neighbor methods

Tutz, Gerhard; Ramzan, Shahla

Improved methods for the imputation of missing data by nearest neighbor methods

Gerhard Tutz and Shahla Ramzan

Computational Statistics & Data Analysis, 2015, vol. 90, issue C, 84-99

Abstract: Missing data raise problems in almost all fields of quantitative research. A useful nonparametric procedure is the nearest neighbor imputation method. Improved versions of this method are presented. First, a weighted nearest neighbor imputation method based on Lq distances is proposed. It is demonstrated that the method tends to have a smaller imputation error than other nearest neighbor estimates. Then weighted nearest neighbor imputation methods that use distances for selected covariates are considered. The careful selection of distances that carry information about the missing values yields an imputation tool that can outperform competing nearest neighbor methods. This approach performs well, especially when the number of predictors is large. The methods are evaluated in simulation studies and with several real data sets from different fields.

Keywords: Kernel function; Weighted nearest neighbors; Cross-validation; Weighted imputation; MCAR (search for similar items in EconPapers)
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947315001061
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:90:y:2015:i:c:p:84-99

DOI: 10.1016/j.csda.2015.04.009

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().