A systematic review on techniques of feature selection and classification for text mining

Sridharan, K.; Sivakumar, P.

A systematic review on techniques of feature selection and classification for text mining

K. Sridharan and P. Sivakumar

International Journal of Business Information Systems, 2018, vol. 28, issue 4, 504-518

Abstract: Nowadays, there is a quick development in the use of internet. The large amount of structured, unstructured and semi-structured forms like videos, images, audio or texts, can be shared and used on the internet by users. The main analysis of text mining is as follows: pre-processing, feature dimension reduction (feature selection or feature extraction) and text classification, clustering on the final features. In this paper, pre-processing is a step, context sensitive stemmer used to remove the stop words, different suffixes by means to reduce the words count. The unsupervised and supervised feature selection methods like document frequency, term strength, chi-square and information gain are compared to produce the best method for the web document feature selection. The classification techniques like latent semantic analysis, genetic algorithm, Rocchio's algorithm and neural networks are also compared with systematic reviews.

Keywords: information gain; IG; document frequency; DF; term strength; TS; artificial neural network; latent semantic analysis; LSA; text mining; stemming. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.inderscience.com/link.php?id=93659 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijbisy:v:28:y:2018:i:4:p:504-518

Access Statistics for this article

More articles in International Journal of Business Information Systems from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().