LF-LDA: A Supervised Topic Model for Multi-Label Documents Classification
Yongjun Zhang,
Zijian Wang,
Yongtao Yu,
Bolun Chen,
Jialin Ma and
Liang Shi
Additional contact information
Yongjun Zhang: Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, China & College of Computer and Information, Hohai University, Nanjing, China
Zijian Wang: College of Computer and Information, Hohai University, Nanjing, China
Yongtao Yu: Huaiyin Institute of Technology, Huaian, China
Bolun Chen: Huaiyin Institute of Technology, Huaian, China
Jialin Ma: The Laboratory for Internet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaian, China & College of Computer and Information, Hohai University, Nanjing, China
Liang Shi: Jiangsu Vocational College of Business, Nantong, China
International Journal of Data Warehousing and Mining (IJDWM), 2018, vol. 14, issue 2, 18-36
Abstract:
This article describes how text documents are a major data structure in the era of big data. With the explosive growth of data, the number of documents with multi-labels has increased dramatically. The popular multi-label classification technology, which is usually employed to handle multinomial text documents, is sensitive to the noise terms of text documents. Therefore, there still exists a huge room for multi-label classification of text documents. This article introduces a supervised topic model, named labeled LDA with function terms (LF-LDA), to filter out the noisy function terms from text documents, which can help to improve the performance of multi-label classification of text documents. The article also shows the derivation of the Gibbs Sampling formulas in detail, which can be generalized to other similar topic models. Based on the textual data set RCV1-v2, the article compared the proposed model with other two state-of-the-art multi-label classifiers, Tuned SVM and labeled LDA, on both Macro-F1 and Micro-F1 metrics. The result shows that LF-LDA outperforms them and has the lowest variance, which indicates the robustness of the LF-LDA classifier.
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/IJDWM.2018040102 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jdwm00:v:14:y:2018:i:2:p:18-36
Access Statistics for this article
International Journal of Data Warehousing and Mining (IJDWM) is currently edited by Eric Pardede
More articles in International Journal of Data Warehousing and Mining (IJDWM) from IGI Global
Bibliographic data for series maintained by Journal Editor ().