Random Forest Prediction Intervals
Haozhe Zhang,
Joshua Zimmerman,
Dan Nettleton and
Daniel J. Nordman
The American Statistician, 2020, vol. 74, issue 4, 392-406
Abstract:
Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels.
Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2019.1585288 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:74:y:2020:i:4:p:392-406
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20
DOI: 10.1080/00031305.2019.1585288
Access Statistics for this article
The American Statistician is currently edited by Eric Sampson
More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().