EconPapers    
Economics at your fingertips  
 

DOLDA: a regularized supervised topic model for high-dimensional multi-class regression

Måns Magnusson (), Leif Jonsson () and Mattias Villani
Additional contact information
Måns Magnusson: Linköping University
Leif Jonsson: Ericsson AB

Computational Statistics, 2020, vol. 35, issue 1, No 12, 175-201

Abstract: Abstract Generating user interpretable multi-class predictions in data-rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant probit model (Johndrow et al., in: Proceedings of the sixteenth international conference on artificial intelligence and statistics, 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al. in Biometrika 97:465–480, 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy and scalability, and demonstrate DOLDA’s advantage in interpreting the generated predictions.

Keywords: Text classification; Latent Dirichlet Allocation; Horseshoe prior; Diagonal Orthant probit model; Interpretable models (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s00180-019-00891-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:35:y:2020:i:1:d:10.1007_s00180-019-00891-1

Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2

DOI: 10.1007/s00180-019-00891-1

Access Statistics for this article

Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik

More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-24
Handle: RePEc:spr:compst:v:35:y:2020:i:1:d:10.1007_s00180-019-00891-1