bsnsing: A Decision Tree Induction Method Based on Recursive Optimal Boolean Rule Composition
Yanchao Liu ()
Additional contact information
Yanchao Liu: Department of Industrial and Systems Engineering, Wayne State University, Detroit, Michigan 48202
INFORMS Journal on Computing, 2022, vol. 34, issue 6, 2908-2929
Abstract:
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process and develops an efficient search algorithm that is able to solve practical instances of the MIP model faster than commercial solvers. The formulation is novel for it directly maximizes the Gini reduction, an effective split selection criterion that has never been modeled in a mathematical program for its nonconvexity. The proposed approach differs from other optimal classification tree models in that it does not attempt to optimize the whole tree; therefore, the flexibility of the recursive partitioning scheme is retained, and the optimization model is more amenable. The approach is implemented in an open-source R package named bsnsing. Benchmarking experiments on 75 open data sets suggest that bsnsing trees are the most capable of discriminating new cases compared with trees trained by other decision tree codes including the rpart, C50, party, and tree packages in R. Compared with other optimal decision tree packages, including DL8.5, OSDT, GOSDT, and indirectly more, bsnsing stands out in its training speed, ease of use, and broader applicability without losing in prediction accuracy.
Keywords: classification trees; mixed-integer programming; statistical computing; R (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/ijoc.2022.1225 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:34:y:2022:i:6:p:2908-2929
Access Statistics for this article
More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().