An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

Dutang, Christophe; Guibert, Quentin

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

Christophe Dutang () and Quentin Guibert ()
Additional contact information
Christophe Dutang: CEREMADE - CEntre de REcherches en MAthématiques de la DEcision - Université Paris Dauphine-PSL - PSL - Université Paris Sciences et Lettres - CNRS - Centre National de la Recherche Scientifique
Quentin Guibert: CEREMADE - CEntre de REcherches en MAthématiques de la DEcision - Université Paris Dauphine-PSL - PSL - Université Paris Sciences et Lettres - CNRS - Centre National de la Recherche Scientifique

Post-Print from HAL

Abstract: Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.

Keywords: GLM; model-based recursive partitioning; GLM trees; random forest; GLM forest (search for similar items in EconPapers)
Date: 2021-11-11
New Economics Papers: this item is included in nep-cmp, nep-ecm and nep-ore
Note: View the original document on HAL open archive server: https://hal.science/hal-03448250v1
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Published in Statistics and Computing, 2021, 32 (1), ⟨10.1007/s11222-021-10059-x⟩

Downloads: (external link)
https://hal.science/hal-03448250v1/document (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-03448250

DOI: 10.1007/s11222-021-10059-x

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().