Improving performance of text categorization by combining filtering and support vector machines

Díaz, Irene; Ranilla, José; Montañes, Elena; Fernández, Javier; Combarro, Elías F.

Improving performance of text categorization by combining filtering and support vector machines

Irene Díaz, José Ranilla, Elena Montañes, Javier Fernández and Elías F. Combarro

Journal of the American Society for Information Science and Technology, 2004, vol. 55, issue 7, 579-592

Abstract: Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time‐consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, on two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance.

Date: 2004
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1002/asi.10409

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:55:y:2004:i:7:p:579-592

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().