Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

Xu, Baoxun; Huang, Joshua Zhexue; Williams, Graham; Wang, Qiang; Ye, Yunming

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang and Yunming Ye
Additional contact information
Baoxun Xu: Harbin Institute of Technology Shenzhen Graduate School, China
Joshua Zhexue Huang: Shenzhen Institutes of Advanced Technology and Chinese Academy of Sciences, China
Graham Williams: Shenzhen Institutes of Advanced Technology, and Chinese Academy of Sciences, China
Qiang Wang: Harbin Institute of Technology Shenzhen Graduate School, China
Yunming Ye: Harbin Institute of Technology Shenzhen Graduate School, China

International Journal of Data Warehousing and Mining (IJDWM), 2012, vol. 8, issue 2, 44-63

Abstract: The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Date: 2012
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 4018/jdwm.2012040103 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jdwm00:v:8:y:2012:i:2:p:44-63

Access Statistics for this article

International Journal of Data Warehousing and Mining (IJDWM) is currently edited by Eric Pardede

More articles in International Journal of Data Warehousing and Mining (IJDWM) from IGI Global
Bibliographic data for series maintained by Journal Editor ().