A systematic review on techniques of feature selection and classification for text mining
K. Sridharan and
P. Sivakumar
International Journal of Business Information Systems, 2018, vol. 28, issue 4, 504-518
Abstract:
Nowadays, there is a quick development in the use of internet. The large amount of structured, unstructured and semi-structured forms like videos, images, audio or texts, can be shared and used on the internet by users. The main analysis of text mining is as follows: pre-processing, feature dimension reduction (feature selection or feature extraction) and text classification, clustering on the final features. In this paper, pre-processing is a step, context sensitive stemmer used to remove the stop words, different suffixes by means to reduce the words count. The unsupervised and supervised feature selection methods like document frequency, term strength, chi-square and information gain are compared to produce the best method for the web document feature selection. The classification techniques like latent semantic analysis, genetic algorithm, Rocchio's algorithm and neural networks are also compared with systematic reviews.
Keywords: information gain; IG; document frequency; DF; term strength; TS; artificial neural network; latent semantic analysis; LSA; text mining; stemming. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://www.inderscience.com/link.php?id=93659 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijbisy:v:28:y:2018:i:4:p:504-518
Access Statistics for this article
More articles in International Journal of Business Information Systems from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().