Random forest with acceptance–rejection trees
Peter Calhoun,
Melodie J. Hallett,
Xiaogang Su,
Guy Cafri,
Richard A. Levine and
Juanjuan Fan ()
Additional contact information
Peter Calhoun: Jaeb Center for Health Research
Melodie J. Hallett: San Diego State University
Xiaogang Su: University of Texas
Guy Cafri: Johnson & Johnson Medical Devices
Richard A. Levine: San Diego State University
Juanjuan Fan: San Diego State University
Computational Statistics, 2020, vol. 35, issue 3, No 4, 983-999
Abstract:
Abstract In this paper, we propose a new random forest method based on completely randomized splitting rules with an acceptance–rejection criterion for quality control. We show how the proposed acceptance–rejection (AR) algorithm can outperform the standard random forest algorithm (RF) and some of its variants including extremely randomized (ER) trees and smooth sigmoid surrogate (SSS) trees. Twenty datasets were analyzed to compare prediction performance and a simulated dataset was used to assess variable selection bias. In terms of prediction accuracy for classification problems, the proposed AR algorithm performed the best, with ER being the second best. For regression problems, RF and SSS performed the best, followed by AR, and then ER at the last. However, each algorithm was most accurate for at least one study. We investigate scenarios where the AR algorithm can yield better predictive performance. In terms of variable importance, both RF and SSS demonstrated selection bias in favor of variables with many possible splits, while both ER and AR largely removed this bias.
Keywords: Classification and regression trees; Supervised learning; Prediction; Variable selection bias; Ensemble methods (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00180-019-00929-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-019-00929-4
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2
DOI: 10.1007/s00180-019-00929-4
Access Statistics for this article
Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik
More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().