A comparison of various imputation algorithms for missing data
Jürgen Kampf,
Iryna Dykun,
Tienush Rassaf and
Amir Abbas Mahabadi
PLOS ONE, 2025, vol. 20, issue 5, 1-16
Abstract:
Background: Many datasets in medicine and other branches of science are incomplete. In this article we compare various imputation algorithms for missing data. Objectives: We take the point of view that it has already been decided that the imputation should be carried out using multiple imputation by chained equation and the only decision left is that of a subroutine for the one-dimensional imputations. The subroutines to be compared are predictive mean matching, weighted predictive mean matching, sampling, classification or regression trees and random forests. Methods: We compare these subroutines on real data and on simulated data. We consider the estimation of expected values, variances and coefficients of linear regression models, logistic regression models and Cox regression models. As real data we use data of the survival times after the diagnosis of an obstructive coronary artery disease with systolic blood pressure, LDL, diabetes, smoking behavior and family history of premature heart diseases as variables for which values have to be imputed. While we are mainly interested in statistical properties like biases, mean squared errors or coverage probabilities of confidence intervals, we also have an eye on the computation time. Results: Weighted predictive mean matching had to be excluded from the statistical comparison due to its enormous computation time. Among the remaining algorithms, in most situations we tested, predictive mean matching performed best. Novelty: This is by far the largest comparison study for subroutines of multiple imputation by chained equations that has been performed up to now.
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0319784 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 19784&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0319784
DOI: 10.1371/journal.pone.0319784
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().