Moment-Based Inference for Regression with Latent Dirichlet Covariates
Ziyu Jiang
Papers from arXiv.org
Abstract:
Topic models are often used as dimension-reduction tools before regression, with estimated document-level topic shares treated as observed covariates. This plug-in workflow creates two inferential difficulties: valid inference requires a regular first-stage-to-second-stage expansion that propagates topic-estimation uncertainty, and, at fixed document length, a document's topic mixture cannot be consistently recovered from its own words even when the population topic matrix is known. Corrected spectral moment methods for latent Dirichlet allocation (LDA) offer a starting point: when the total Dirichlet concentration is known, low-order word moments can be corrected to yield operators diagonal in the latent topic basis. We extend this to downstream regression. Under a finite LDA model with response residuals orthogonal to the low-order token moments used for identification, response-weighted word moments admit the same correction, and the resulting supervised operator identifies the regression coefficient $\beta$ directly, without estimating document-level topic shares. The main obstacle is that the correction depends on the unknown total concentration $\alpha_0$. We show that, for $k\ge3$ topics and under a generic finite-probe condition, $\alpha_0$ is identified by commutativity: at the true value a family of corrected word-moment operators commute, whereas away from it they generically do not. This yields a feasible estimator and lets uncertainty in $\hat\alpha_0$ propagate into inference for $\beta$. The estimator is asymptotically linear as the number of documents grows with fixed document length, with sandwich standard errors from document-level moment contributions. Simulations show near-nominal coverage where plug-in topic-share regressions can undercover, and an application to top economics journals illustrates contrast inference for latent topic effects.
Date: 2026-05
References: Add references at CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2605.30718 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2605.30718
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().