EconPapers    
Economics at your fingertips  
 

Bayesian Random Forest with Multiple Imputation by Chain Equations for High-Dimensional Missing Data: A Simulation Study

Oyebayo Ridwan Olaniran and Ali Rashash R. Alzahrani ()
Additional contact information
Oyebayo Ridwan Olaniran: Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin 1515, Nigeria
Ali Rashash R. Alzahrani: Mathematics Department, Faculty of Sciences, Umm Al-Qura University, Makkah 24382, Saudi Arabia

Mathematics, 2025, vol. 13, issue 6, 1-32

Abstract: The pervasive challenge of missing data in scientific research forces a critical trade-off: discarding incomplete observations, which risks significant information loss, while conventional imputation methods struggle to maintain accuracy in high-dimensional settings. Although approaches like multiple imputation (MI) and random forest (RF) proximity-based imputation offer improvements over naive deletion, they exhibit limitations in complex missing data scenarios or sparse high-dimensional settings. To address these gaps, we propose a novel integration of Multiple Imputation by Chained Equations (MICE) with Bayesian Random Forest (BRF), leveraging MICE’s iterative flexibility and BRF’s probabilistic robustness to enhance the imputation accuracy and downstream predictive performance. Our hybrid framework, BRF-MICE, uniquely combines the efficiency of MICE’s chained equations with BRF’s ability to quantify uncertainty through Bayesian tree ensembles, providing stable parameter estimates even under extreme missingness. We empirically validate this approach using synthetic datasets with controlled missingness mechanisms (MCAR, MAR, MNAR) and dimensionality, contrasting it against established methods, including RF and Bayesian Additive Regression Trees (BART). The results demonstrate that BRF-MICE achieves a superior performance in classification and regression tasks, with a 15–20% lower error under varying missingness conditions compared to RF and BART while maintaining computational scalability. The method’s iterative Bayesian updates effectively propagate imputation uncertainty, reducing overconfidence in high-dimensional predictions, a key weakness of frequentist alternatives.

Keywords: multiple imputation; missing data; Bayesian random forest; high-dimensional analysis; random forest; simulation study (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/6/956/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/6/956/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:6:p:956-:d:1611773

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-04-05
Handle: RePEc:gam:jmathe:v:13:y:2025:i:6:p:956-:d:1611773