Optimal decision trees for categorical data via integer programming
Oktay Günlük (),
Jayant Kalagnanam (),
Minhan Li (),
Matt Menickelly () and
Katya Scheinberg ()
Additional contact information
Oktay Günlük: Cornell University
Jayant Kalagnanam: IBM Research
Minhan Li: Lehigh University
Matt Menickelly: Argonne National Laboratory
Katya Scheinberg: Cornell University
Journal of Global Optimization, 2021, vol. 81, issue 1, No 9, 233-260
Abstract:
Abstract Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.
Keywords: Decision trees; Integer programming; Machine learning; Binary classification (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
http://link.springer.com/10.1007/s10898-021-01009-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jglopt:v:81:y:2021:i:1:d:10.1007_s10898-021-01009-y
Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10898
DOI: 10.1007/s10898-021-01009-y
Access Statistics for this article
Journal of Global Optimization is currently edited by Sergiy Butenko
More articles in Journal of Global Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().