EconPapers    
Economics at your fingertips  
 

A new text categorisation strategy: prototype design and experimental analysis

N. Venkata Sailaja, L. Padma Sree and N. Mangathayaru

International Journal of Knowledge and Learning, 2020, vol. 13, issue 2, 146-167

Abstract: Since a decade, ample amount of text data is being generated through various web sources in online or offline scenarios. This huge amount of data is mainly inconsistent and non-structured format, so hard to process through computing machines available. With the advent of computers and the information age, statistical and analytical problems have also grown both in the size and complexity. Text classification using various machine learning mechanisms encounters the difficulty of the high dimensionality of attributes vector. Therefore, a feature selection technique is very much required to discard irrelevant as well as noisy attributes from the feature set vector so that the ML algorithms can work efficiently. In this paper, a hybrid method is proposed for text documents classification. Further, proposed method's performance is evaluated on standard datasets, i.e., Reuters-21578 and 20 newsgroups. We opted 'bydate' version of the dataset containing 18,941 documents. Through our experiments, we attempted to explore the various performance measures.

Keywords: text classification; rough sets; RS; information retrieval feature selection; machine learning; evaluation; 20 newsgroups; Reuters-21578. (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=106650 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijklea:v:13:y:2020:i:2:p:146-167

Access Statistics for this article

More articles in International Journal of Knowledge and Learning from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijklea:v:13:y:2020:i:2:p:146-167