EconPapers    
Economics at your fingertips  
 

To Bag is to Prune

Philippe Goulet Coulombe
Additional contact information
Philippe Goulet Coulombe: University of Pennsylvania

No 21-03, Working Papers from Chair in macroeconomics and forecasting, University of Quebec in Montreal's School of Management

Abstract: It is notoriously difficult to build a bad Random Forest (RF). Concurrently, RF blatantly overfits in-sample without any apparent consequence out-of-sample. Standard arguments, like the classic bias-variance trade-off or double descent, cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a latent "true" tree. More generally, randomized ensembles of greedily optimized learners implicitly perform optimal early stopping out-of-sample. So there is no need to tune the stopping point. By construction, novel variants of Boosting and MARS are also eligible for automatic tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles perform similarly to their tuned counterparts - or better.

Keywords: Random Forest; Trees; Pruning; Greedy Algorithms; Double Descent; Deep Learning. (search for similar items in EconPapers)
Pages: 33 pages
Date: 2021-03, Revised 2021-06
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://chairemacro.esg.uqam.ca/wp-content/uploads ... BITP_permanent-2.pdf Revised version, 2020 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bbh:wpaper:21-03

Access Statistics for this paper

More papers in Working Papers from Chair in macroeconomics and forecasting, University of Quebec in Montreal's School of Management Contact information at EDIRC.
Bibliographic data for series maintained by Dalibor Stevanovic and Alain Guay ().

 
Page updated 2025-04-03
Handle: RePEc:bbh:wpaper:21-03