EconPapers    
Economics at your fingertips  
 

Effective feature selection technique for text classification

Hari Seetha, M. Narasimha Murty and R. Saravanan

International Journal of Data Mining, Modelling and Management, 2015, vol. 7, issue 3, 165-184

Abstract: Text classification plays a vital role in the organisation of the unceasing growth of digital documents. High dimensionality of feature space is a major hassle in text classification. Feature selection, an effective preprocessing technique improves the computational efficiency and the accuracy of a text classifier. In the present paper, text classification is performed with Zipf's law-based feature selection and the use of linear SVM weight for feature ranking. A hybrid feature selection method combining these two feature selection techniques is proposed. Nearest neighbour and SVM classifiers are chosen as text classifiers for their good classification accuracy reported in many text classification tasks. Moreover, to investigate the effect of kernel type on the text classification both linear and non-linear kernels in SVM are examined. The performance is evaluated by determining classification accuracy using ten-fold cross-validation. Experimental results with four benchmark corpuses were encouraging and demonstrated that the classification performance using hybrid feature selection method outperformed the classification performance obtained by selecting either medium frequent features based on Zipf's law or using feature selection by linear SVM.

Keywords: text classification; SVM classifier; support vector machines; nearest neighbour classifier; feature selection; classification accuracy. (search for similar items in EconPapers)
Date: 2015
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=71451 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:7:y:2015:i:3:p:165-184

Access Statistics for this article

More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijdmmm:v:7:y:2015:i:3:p:165-184