EconPapers    
Economics at your fingertips  
 

Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms

Siong Thye Goh (), Lesia Semenova () and Cynthia Rudin ()
Additional contact information
Siong Thye Goh: Lee Kong Chian School of Business, Singapore Management University, Singapore 178899
Lesia Semenova: Department of Computer Science, Duke University, Durham, North Carolina 27708
Cynthia Rudin: Department of Computer Science, Duke University, Durham, North Carolina 27708

INFORMS Joural on Data Science, 2024, vol. 3, issue 1, 28-48

Abstract: We present sparse tree-based and list-based density estimation methods for binary/categorical data. Our density estimation models are higher-dimensional analogies to variable bin-width histograms. In each leaf of the tree (or list), the density is constant, similar to the flat density within the bin of a histogram. Histograms, however, cannot easily be visualized in more than two dimensions, whereas our models can. The accuracy of histograms fades as dimensions increase, whereas our models have priors that help with generalization. Our models are sparse, unlike high-dimensional fixed-bin histograms. We present three generative modeling methods, where the first one allows the user to specify the preferred number of leaves in the tree within a Bayesian prior. The second method allows the user to specify the preferred number of branches within the prior. The third method returns density lists (rather than trees) and allows the user to specify the preferred number of rules and the length of rules within the prior. The new approaches often yield a better balance between sparsity and accuracy of density estimates than other methods for this task. We present an application to crime analysis, where we estimate how unusual each type of modus operandi is for a house break-in.

Keywords: density estimation; tree-based models; histogram; interpretability (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/ijds.2021.0001 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48

Access Statistics for this article

More articles in INFORMS Joural on Data Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48