EconPapers    
Economics at your fingertips  
 

Distance Variance Score: An Efficient Feature Selection Method in Text Classification

Heyong Wang and Ming Hong

Mathematical Problems in Engineering, 2015, vol. 2015, 1-10

Abstract:

With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining. Text classification is one of the most important subfields of text mining. In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification. Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM. An efficient feature selection method is capable of both reducing dimensions of DTM and selecting discriminative features for text classification. Laplacian Score (LS) is one of the unsupervised feature selection methods and it has been successfully used in areas such as face recognition. However, LS is unable to select discriminative features for text classification and to effectively reduce the sparsity of DTM. To improve it, this paper proposes an unsupervised feature selection method named Distance Variance Score (DVS). DVS uses feature distance contribution (a ratio) to rank the importance of features for text documents so as to select discriminative features. Experimental results indicate that DVS is able to select discriminative features and reduce the sparsity of DTM. Thus, it is much more efficient than LS.

Date: 2015
References: Add references at CitEc
Citations:

Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2015/695720.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2015/695720.xml (text/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:695720

DOI: 10.1155/2015/695720

Access Statistics for this article

More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().

 
Page updated 2025-03-19
Handle: RePEc:hin:jnlmpe:695720