Learning from high dimensional data based on weighted feature importance in decision tree ensembles
Nayiri Galestian Pour () and
Soudabeh Shemehsavar ()
Additional contact information
Nayiri Galestian Pour: University of Tehran
Soudabeh Shemehsavar: University of Tehran
Computational Statistics, 2024, vol. 39, issue 1, No 16, 313-342
Abstract:
Abstract Learning from high dimensional data has been utilized in various applications such as computational biology, image classification, and finance. Most classical machine learning algorithms fail to give accurate predictions in high dimensional settings due to the enormous feature space. In this article, we present a novel ensemble of classification trees based on weighted random subspaces that aims to adjust the distribution of selection probabilities. In the proposed algorithm base classifiers are built on random feature subspaces in which the probability that influential features will be selected for the next subspace, is updated by incorporating grouping information based on previous classifiers through a weighting function. As an interpretation tool, we show that variable importance measures computed by the new method can identify influential features efficiently. We provide theoretical reasoning for the different elements of the proposed method, and we evaluate the usefulness of the new method based on simulation studies and real data analysis.
Keywords: Ensemble learning; Decision trees; Random subspace; Variable importance measure; High dimensional data (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00180-023-01347-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:39:y:2024:i:1:d:10.1007_s00180-023-01347-3
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2
DOI: 10.1007/s00180-023-01347-3
Access Statistics for this article
Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik
More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().