Evaluation of imputation strategies for multi-centre studies: Application to a large clinical pathology dataset
Lucy Grigoroff,
Reika Masuda,
John Lindon,
Janonna Kadyrov,
Jeremy K Nicholson,
Elaine Holmes and
Julien Wist
PLOS ONE, 2025, vol. 20, issue 11, 1-15
Abstract:
As part of a strategy for accommodating missing data in large heterogeneous datasets, two Random Forest-based (RF) imputation methods, missForest and MICE were evaluated along with several strategies to help navigate the inherently incomplete structure of the dataset. Background: A total of 3817 complete cases of clinical chemistry variables from a large-scale, multi-site preclinical longitudinal pathology study were used as an evaluation dataset. Three types of ‘missingness’ in various proportions were artificially introduced to compare imputation performance for different strategies including variable inclusion and stratification. Results: MissForest was found to outperform MICE, being robust and capable of automatic variable selection. Stratification had minimal effect on missForest but severely deteriorated the performance of MICE. Conclusion: In general, storing and sharing datasets prior to any correction is a good practise, so that imputation can be performed on merged data if necessary.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335852 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 35852&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0335852
DOI: 10.1371/journal.pone.0335852
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().