Statistical inference in the presence of imputed survey data through regression trees and random forests
Mehdi Dagdoug,
Camelia Goga and
David Haziza
Scandinavian Journal of Statistics, 2025, vol. 52, issue 2, 960-998
Abstract:
Item nonresponse in surveys is usually handled through some form of imputation. In recent years, imputation through machine learning procedures has attracted a lot of attention in national statistical offices. However, little is known about the theoretical properties of the resulting point estimators in a survey setting. In this article, we study regression trees and random forests that provide flexible tools for obtaining imputed values. In a high‐dimensional framework allowing the number of predictors to diverge, we lay out a set of conditions for establishing the mean square consistency of regression trees and random forests imputed estimators of a finite population mean. We propose a novel variance estimator based on a K$$ K $$‐fold cross‐validation procedure. The proposed point and variance estimation are assessed through a simulation study in terms of bias, efficiency, and coverage rate of normal‐based confidence intervals. Finally, the choice of hyperparameters involved in random forest algorithms is investigated through theoretical and empirical work.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1111/sjos.12777
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:scjsta:v:52:y:2025:i:2:p:960-998
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0303-6898
Access Statistics for this article
Scandinavian Journal of Statistics is currently edited by ÿrnulf Borgan and Bo Lindqvist
More articles in Scandinavian Journal of Statistics from Danish Society for Theoretical Statistics, Finnish Statistical Society, Norwegian Statistical Association, Swedish Statistical Association
Bibliographic data for series maintained by Wiley Content Delivery ().