EconPapers    
Economics at your fingertips  
 

Entry-Wise Eigenvector Analysis and Improved Rates for Topic Modeling on Short Documents

Zheng Tracy Ke () and Jingming Wang
Additional contact information
Zheng Tracy Ke: Department of Statistics, Harvard University, Cambridge, MA 02138, USA
Jingming Wang: Department of Statistics, Harvard University, Cambridge, MA 02138, USA

Mathematics, 2024, vol. 12, issue 11, 1-41

Abstract: Topic modeling is a widely utilized tool in text analysis. We investigate the optimal rate for estimating a topic model. Specifically, we consider a scenario with n documents, a vocabulary of size p , and document lengths at the order N . When N ≥ c · p , referred to as the long-document case, the optimal rate is established in the literature at p / ( N n ) . However, when N = o ( p ) , referred to as the short-document case, the optimal rate remains unknown. In this paper, we first provide new entry-wise large-deviation bounds for the empirical singular vectors of a topic model. We then apply these bounds to improve the error rate of a spectral algorithm, Topic-SCORE. Finally, by comparing the improved error rate with the minimax lower bound, we conclude that the optimal rate is still p / ( N n ) in the short-document case.

Keywords: decoupling inequality; entry-wise eigenvector analysis; pre-SVD normalization; sine-theta theorem; topic-SCORE; word frequency heterogeneity (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/11/1682/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/11/1682/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:11:p:1682-:d:1403981

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:12:y:2024:i:11:p:1682-:d:1403981