Identifying Informative Predictor Variables With Random Forests

Rothacher, Yannick; Strobl, Carolin

Identifying Informative Predictor Variables With Random Forests

Yannick Rothacher and Carolin Strobl
Additional contact information
Carolin Strobl: University of Zurich

Journal of Educational and Behavioral Statistics, 2024, vol. 49, issue 4, 595-629

Abstract: Random forests are a nonparametric machine learning method, which is currently gaining popularity in the behavioral sciences. Despite random forestsâ€™ potential advantages over more conventional statistical methods, a remaining question is how reliably informative predictor variables can be identified by means of random forests. The present study aims at giving a comprehensible introduction to the topic of variable selection with random forests and providing an overview of the currently proposed selection methods. Using simulation studies, the variable selection methods are examined regarding their statistical properties, and comparisons between their performances and the performance of a conventional linear model are drawn. Advantages and disadvantages of the examined methods are discussed, and practical recommendations for the use of random forests for variable selection are given.

Keywords: random forest; variable importance; interpretable machine learning; recursive partitioning; variable selection (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.sagepub.com/doi/10.3102/10769986231193327 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:jedbes:v:49:y:2024:i:4:p:595-629

DOI: 10.3102/10769986231193327

Access Statistics for this article

More articles in Journal of Educational and Behavioral Statistics
Bibliographic data for series maintained by SAGE Publications ().