EconPapers    
Economics at your fingertips  
 

Optimal decision trees for categorical data via integer programming

Oktay Günlük (), Jayant Kalagnanam (), Minhan Li (), Matt Menickelly () and Katya Scheinberg ()
Additional contact information
Oktay Günlük: Cornell University
Jayant Kalagnanam: IBM Research
Minhan Li: Lehigh University
Matt Menickelly: Argonne National Laboratory
Katya Scheinberg: Cornell University

Journal of Global Optimization, 2021, vol. 81, issue 1, No 9, 233-260

Abstract: Abstract Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.

Keywords: Decision trees; Integer programming; Machine learning; Binary classification (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://link.springer.com/10.1007/s10898-021-01009-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jglopt:v:81:y:2021:i:1:d:10.1007_s10898-021-01009-y

Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10898

DOI: 10.1007/s10898-021-01009-y

Access Statistics for this article

Journal of Global Optimization is currently edited by Sergiy Butenko

More articles in Journal of Global Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:jglopt:v:81:y:2021:i:1:d:10.1007_s10898-021-01009-y