EconPapers    
Economics at your fingertips  
 

Addressing topic modelling via reduced latent space clustering

Lorenzo Schiavon ()
Additional contact information
Lorenzo Schiavon: Ca’ Foscari University of Venice, San Giobbe

Statistical Methods & Applications, 2025, vol. 34, issue 1, No 1, 20 pages

Abstract: Abstract In the social sciences, topic modelling is gaining increased attention for its ability to automatically uncover the underlying themes within large corpora of textual data. This process typically involves two key phases: (i) identifying the words associated with language concepts, and (ii) clustering documents that share similar word distributions. In this study, motivated by the growing interest in automatic categorisation of policy documents and regulations, we leverage recent advancements in Bayesian factor models to develop a novel topic modelling approach. This enable us to represent the high-dimensional space defined by all possible observed words through a small set of latent variables, and simultaneously cluster the documents based on their distributions over these latent constructs. Here, groups and underlying constructs are interpreted as document topics and language concepts, respectively, with the number of dimensions not required in advance. Additionally, we demonstrate the effectiveness of our approach using synthetic data, providing a comparison with existing methods in the literature. The illustration of our approach on a corpus of Italian health public plans unveils intriguing patterns concerning the semantic structures used in ageing policies and document topic similarities.

Keywords: Ageing policies; Dirichlet process; Gibbs sampling; Health plans; Infinite factor model; Nonparametric Bayes; Textual data (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10260-025-00779-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:stmapp:v:34:y:2025:i:1:d:10.1007_s10260-025-00779-z

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10260/PS2

DOI: 10.1007/s10260-025-00779-z

Access Statistics for this article

Statistical Methods & Applications is currently edited by Tommaso Proietti

More articles in Statistical Methods & Applications from Springer, Società Italiana di Statistica
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-21
Handle: RePEc:spr:stmapp:v:34:y:2025:i:1:d:10.1007_s10260-025-00779-z