Ensemble of optimal trees, random forest and random projection ensemble classification

Khan, Zardad; Gul, Asma; Perperoglou, Aris; Miftahuddin, Miftahuddin; Mahmoud, Osama; Adler, Werner; Lausen, Berthold

Ensemble of optimal trees, random forest and random projection ensemble classification

Zardad Khan (), Asma Gul, Aris Perperoglou, Miftahuddin Miftahuddin, Osama Mahmoud, Werner Adler and Berthold Lausen ()
Additional contact information
Zardad Khan: Abdul Wali Khan University
Asma Gul: University of Essex
Aris Perperoglou: University of Essex
Miftahuddin Miftahuddin: University of Essex
Osama Mahmoud: University of Essex
Werner Adler: University of Erlangen-Nuremberg
Berthold Lausen: University of Essex

Advances in Data Analysis and Classification, 2020, vol. 14, issue 1, No 6, 97-116

Abstract: Abstract The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and then assess these trees for diversity using the Brier score on an independent validation sample. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. Our approach does not use an implicit dimension reduction for each tree as random project ensemble classification. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with random forest, random projection ensemble, node harvest, support vector machine, kNN and classification and regression tree. We compute unexplained variances or classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. Results of a simulation study are also given where four tree style scenarios are considered to generate data sets with several structures.

Keywords: Ensemble classification; Ensemble regression; Random forest; Random projection ensemble classification; Accuracy and diversity; 62-00; 62-07 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://link.springer.com/10.1007/s11634-019-00364-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:14:y:2020:i:1:d:10.1007_s11634-019-00364-9

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-019-00364-9

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().