Nonparametric imputation by data depth
Pavlo Mozharovskyi (),
Julie Josse () and
François Husson ()
Additional contact information
Pavlo Mozharovskyi: CREST; ENSAI; Université Bretagne Loire
Julie Josse: CMAP; Ecole polytechnique
François Husson: IRMAR; Applied Mathematics Unit; Agrocampus Ouest
No 2017-72, Working Papers from Center for Research in Economics and Statistics
The presented methodology for single imputation of missing values borrows the idea from data depth — a measure of centrality defined for an arbitrary point of the space with respect to a probability distribution or a data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. On each single iteration, imputation is narrowed down to optimization of quadratic, linear, or quasiconcave function being solved analytically, by linear programming, or the Nelder-Mead method, respectively. Being able to grasp the underlying data topology, the procedure is distribution free, allows to impute close to the data, preserves prediction possibilities different to local imputation methods (k-nearest neighbors, random forest), and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that its particular case — when using Mahalanobis depth — has direct connection to well known treatments for multivariate normal model, such as iterated regression or regularized PCA. The methodology is extended to the multiple imputation for data stemming from an elliptically symmetric distribution. Simulation and real data studies positively contrast the procedure with existing popular alternatives. The method has been implemented as an R-package.
Keywords: Elliptical symmetry; Outliers; Tukey depth; Zonoid depth; Nonparametric imputation; Convex optimization (search for similar items in EconPapers)
Pages: 31 pages
New Economics Papers: this item is included in nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
http://crest.science/RePEc/wpstorage/2017-72.pdf CREST working paper version (application/pdf)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:crs:wpaper:2017-72
Access Statistics for this paper
More papers in Working Papers from Center for Research in Economics and Statistics Contact information at EDIRC.
Bibliographic data for series maintained by Secretariat General () and Murielle Jules Maintainer-Email : murielle.jules@ensae.Fr.