A lexical-semantics-based method for multi label text categorisation using word net
Rajni Jindal and
Shweta Taneja
International Journal of Data Mining, Modelling and Management, 2017, vol. 9, issue 4, 340-360
Abstract:
Text categorisation is an upcoming area in the field of text mining. The text documents possess huge number of features due to their unstructured nature. In this paper, an algorithm for multi label categorisation of text documents based on the concepts of lexical and semantics using word net (MC-LSW) is proposed. The proposed algorithm is based on the concepts of lexical (tokens) and semantics of a language. It aims at minimising the number of tokens used for categorising text documents. MC-LSW uses word net to extract the semantic information of tokens. The proposed algorithm is implemented and tested on five datasets of text domain and is compared with the existing multi label categorisation algorithms. The proposed algorithm (MC-LSW) shows more efficient and promising results in terms of space and time complexity than the existing methods. Accuracy and precision measures have been improved by the proposed algorithm as well as hamming loss has been reduced.
Keywords: multi label text categorisation; lexical analysis; semantic analysis; word net. (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=88412 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:9:y:2017:i:4:p:340-360
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().