EconPapers    
Economics at your fingertips  
 

Hidden Markov model with Pitman-Yor priors for probabilistic topic model

Jianjie Guo, Lin Guo, Wenchao Xu and Haibin Zhang

Communications in Statistics - Theory and Methods, 2025, vol. 54, issue 9, 2791-2805

Abstract: Empirical studies of natural language have demonstrated that word frequencies follow power law distributions. However, standard statistical models often fail to capture this property. The Pitman-Yor process (PYP), a Bayesian non parametric model capable of generating power law distributions, has been widely used in probabilistic topic models to handle data with an infinite number of components. However, existing PYP topic models rarely account for the relationships between topics. Hidden Markov models (HMMs) are popular models for modeling topic relationships. To address this limitation, we propose a probabilistic topic model that combines HMM with Pitman-Yor priors. The posterior inference was performed by using variational Bayes methods. We applied our method to text categorization and compared it with two related topic models: the hidden Markov topic model and hierarchical PYP topic model.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2024.2370920 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:54:y:2025:i:9:p:2791-2805

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20

DOI: 10.1080/03610926.2024.2370920

Access Statistics for this article

Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe

More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-04-03
Handle: RePEc:taf:lstaxx:v:54:y:2025:i:9:p:2791-2805