ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation
Carlo Schwarz
Stata Journal, 2018, vol. 18, issue 1, 101-117
Abstract:
In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.
Keywords: ldagibbs; machine learning; latent Dirichlet allocation; Gibbs sampling; topic model; text analysis (search for similar items in EconPapers)
Date: 2018
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj18-1/st0515/
References: Add references at CitEc
Citations: View citations in EconPapers (14)
Downloads: (external link)
http://www.stata-journal.com/article.html?article=st0515 link to article purchase
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:tsj:stataj:v:18:y:2018:i:1:p:101-117
Ordering information: This journal article can be ordered from
http://www.stata-journal.com/subscription.html
Access Statistics for this article
Stata Journal is currently edited by Nicholas J. Cox and Stephen P. Jenkins
More articles in Stata Journal from StataCorp LLC
Bibliographic data for series maintained by Christopher F. Baum () and Lisa Gilmore ().