EconPapers    
Economics at your fingertips  
 

CLDA: An Effective Topic Model for Mining User Interest Preference under Big Data Background

Lirong Qiu and Jia Yu

Complexity, 2018, vol. 2018, 1-10

Abstract:

In the present big data background, how to effectively excavate useful information is the problem that big data is facing now. The purpose of this study is to construct a more effective method of mining interest preferences of users in a particular field in the context of today’s big data. We mainly use a large number of user text data from microblog to study. LDA is an effective method of text mining, but it will not play a very good role in applying LDA directly to a large number of short texts in microblog. In today’s more effective topic modeling project, short texts need to be aggregated into long texts to avoid data sparsity. However, aggregated short texts are mixed with a lot of noise, reducing the accuracy of mining the user’s interest preferences. In this paper, we propose Combining Latent Dirichlet Allocation (CLDA), a new topic model that can learn the potential topics of microblog short texts and long texts simultaneously. The data sparsity of short texts is avoided by aggregating long texts to assist in learning short texts. Short text filtering long text is reused to improve mining accuracy, making long texts and short texts effectively combined. Experimental results in a real microblog data set show that CLDA outperforms many advanced models in mining user interest, and we also confirm that CLDA also has good performance in recommending systems.

Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://downloads.hindawi.com/journals/8503/2018/2503816.pdf (application/pdf)
http://downloads.hindawi.com/journals/8503/2018/2503816.xml (text/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:complx:2503816

DOI: 10.1155/2018/2503816

Access Statistics for this article

More articles in Complexity from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().

 
Page updated 2025-03-19
Handle: RePEc:hin:complx:2503816