Addressing sample selection bias for machine learning methods
Dylan Brewer and
Alyssa Carlson
Journal of Applied Econometrics, 2024, vol. 39, issue 3, 383-400
Abstract:
We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common approaches are to weight or include variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using popular machine‐learning algorithms. Common machine learning practices such as weighting or including variables that influence selection into the training or prediction sample often worsen sample selection bias. We propose two control function approaches that remove the effects of selection bias before training and find that they reduce mean‐squared prediction error in simulations. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re‐election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used.
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/jae.3029
Related works:
Working Paper: Addressing Sample Selection Bias for Machine Learning Methods (2023) 
Working Paper: Addressing Sample Selection Bias for Machine Learning Methods (2023) 
Working Paper: Addressing Sample Selection Bias for Machine Learning Methods (2021) 
Working Paper: Addressing Sample Selection Bias for Machine Learning Methods (2021) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:japmet:v:39:y:2024:i:3:p:383-400
Ordering information: This journal article can be ordered from
http://www3.intersci ... e.jsp?issn=0883-7252
Access Statistics for this article
Journal of Applied Econometrics is currently edited by M. Hashem Pesaran
More articles in Journal of Applied Econometrics from John Wiley & Sons, Ltd.
Bibliographic data for series maintained by Wiley Content Delivery ().