Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects
Roman Hornung and
Anne-Laure Boulesteix
Computational Statistics & Data Analysis, 2022, vol. 171, issue C
Abstract:
Although interaction effects can be exploited to improve predictions and allow for valuable insights into covariate interplay, they are given limited attention in analysis. Interaction forests are a variant of random forests for categorical, continuous, and survival outcomes that explicitly models quantitative and qualitative interaction effects in bivariable splits performed by the trees constituting the forests. The new effect importance measure (EIM) associated with interaction forests allows for ranking of covariate pairs with respect to their interaction effects' importance to prediction. Using EIM, separate importance value lists for univariable effects, quantitative interaction effects, and qualitative interaction effects are obtained. In the spirit of interpretable machine learning, the bivariable split types of interaction forests target easily interpretable and communicable interaction effects. To learn about the nature of the interplay between covariates identified as interacting it is convenient to visualise their estimated bivariable influence. Functions that perform this task are provided in the R package diversityForest, which implements interaction forests. In a large-scale empirical study using 220 data sets, interaction forests tended to deliver better predictions than conventional random forests and competing random forest variants that use multivariable splitting. In a simulation study, EIM delivered considerably better rankings for the relevant quantitative and qualitative interaction effects than competing approaches. These results indicate that interaction forests are suitable tools for the challenging task of identifying and making use of easily interpretable and communicable interaction effects in predictive modelling.
Keywords: Interaction effects; Random forest; Feature importance; Non-parametric modeling; Machine learning (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947322000408
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:171:y:2022:i:c:s0167947322000408
DOI: 10.1016/j.csda.2022.107460
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().