Robust machine learning algorithms for text analysis
Shikun Ke, 
José Luis Montiel Olea and 
James Nesbit
Quantitative Economics, 2024, vol. 15, issue 4, 939-970
Abstract:
We study the Latent Dirichlet Allocation model, a popular Bayesian algorithm for text analysis. We show that the model's parameters are not identified, which suggests that the choice of prior matters. We characterize the range of values that the posterior mean of a given functional of the model's parameters can attain in response to a change in the prior, and we suggest two algorithms that report this range. Both of our algorithms rely on obtaining multiple Nonnegative Matrix Factorizations of either the posterior draws of the corpus' population term‐document frequency matrix or of its maximum likelihood estimator. The key idea is to maximize/minimize the functional of interest over all these nonnegative matrix factorizations. To illustrate the applicability of our results, we revisit recent work studying the effects of increased transparency on the communication structure of monetary policy discussions in the United States.
Date: 2024
References: Add references at CitEc 
Citations: 
Downloads: (external link)
https://doi.org/10.3982/QE1825
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX 
RIS (EndNote, ProCite, RefMan) 
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:quante:v:15:y:2024:i:4:p:939-970
Ordering information: This journal article can be ordered from
https://www.econometricsociety.org/membership
Access Statistics for this article
More articles in Quantitative Economics  from  Econometric Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().