EconPapers    
Economics at your fingertips  
 

An Improved LDA Topic Modeling Method Based on Partition for Medium and Long Texts

Chonghui Guo (), Menglin Lu () and Wei Wei ()
Additional contact information
Chonghui Guo: Dalian University of Technology
Menglin Lu: Dalian University of Technology
Wei Wei: Zhengzhou University

Annals of Data Science, 2021, vol. 8, issue 2, No 8, 344 pages

Abstract: Abstract Latent Dirichlet Allocation (LDA) is a topic model that represents a document as a distribution of multiple topics. It expresses each topic as a distribution of multiple words by mining semantic relationships hidden in text. However, traditional LDA ignores some of the semantic features hidden inside the document semantic structure of medium and long texts. Instead of using the original LDA to model the topic at the document level, it is better to refine the document into different semantic topic units. In this paper, we propose an improved LDA topic model based on partition (LDAP) for medium and long texts. LDAP not only preserves the benefits of the original LDA but also refines the modeled granularity from the document level to the semantic topic level, which is particularly suitable for the topic modeling of the medium and long text. The extensive experimental classification results on Fudan University corpus and Sougou Lab corpus demonstrate that LDAP achieves better performance compared with other topic models, such as LDA, HDP, LSA and doc2vec.

Keywords: Topic model; Latent Dirichlet allocation; Semantic topic unit; Text representation (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s40745-019-00218-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:8:y:2021:i:2:d:10.1007_s40745-019-00218-3

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-019-00218-3

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:aodasc:v:8:y:2021:i:2:d:10.1007_s40745-019-00218-3