A Novel Tagging Augmented LDA Model for Clustering
Yi Zhao,
Yu Qiao and
Keqing He
Additional contact information
Yi Zhao: School of Computer Science, Wuhan University, Wuhan, China
Yu Qiao: School of Computer Science, Wuhan University, Wuhan, China
Keqing He: School of Computer Science, Wuhan University, Wuhan, China
International Journal of Web Services Research (IJWSR), 2019, vol. 16, issue 3, 59-77
Abstract:
Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.
Date: 2019
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/IJWSR.2019070104 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jwsr00:v:16:y:2019:i:3:p:59-77
Access Statistics for this article
International Journal of Web Services Research (IJWSR) is currently edited by Liang-Jie Zhang
More articles in International Journal of Web Services Research (IJWSR) from IGI Global
Bibliographic data for series maintained by Journal Editor ().