EconPapers    
Economics at your fingertips  
 

A bias-variance analysis of state-of-the-art random forest text classifiers

Thiago Salles (), Leonardo Rocha () and Marcos Gonçalves ()
Additional contact information
Thiago Salles: Federal University of Minas Gerais
Leonardo Rocha: Federal University of São João Del Rei
Marcos Gonçalves: Federal University of Minas Gerais

Advances in Data Analysis and Classification, 2021, vol. 15, issue 2, No 6, 379-405

Abstract: Abstract Random forest (RF) classifiers do excel in a variety of automatic classification tasks, such as topic categorization and sentiment analysis. Despite such advantages, RF models have been shown to perform poorly when facing noisy data, commonly found in textual data, for instance. Some RF variants have been proposed to provide better generalization capabilities under such challenging scenario, including lazy, boosted and randomized forests, all which exhibit significant reductions on error rate when compared to the traditional RFs. In this work, we analyze the behavior of such variants under the bias-variance decomposition of error rate. Such an analysis is of utmost importance to uncover the main causes of the observed improvements enjoyed by those variants in classification effectiveness. As we shall see, significant reductions in variance along with stability in bias explain a large portion of the improvements for the lazy and boosted RF variants. Such an analysis also sheds light on new promising directions for further enhancements in RF-based learners, such as the introduction of new randomization sources on both, lazy and boosted variants.

Keywords: Random forests; Text classification; Bias variance analysis; 62K25; 62F86 (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11634-020-00409-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:15:y:2021:i:2:d:10.1007_s11634-020-00409-4

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-020-00409-4

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:advdac:v:15:y:2021:i:2:d:10.1007_s11634-020-00409-4