Improving performance of text categorization by combining filtering and support vector machines
Irene Díaz,
José Ranilla,
Elena Montañes,
Javier Fernández and
Elías F. Combarro
Journal of the American Society for Information Science and Technology, 2004, vol. 55, issue 7, 579-592
Abstract:
Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time‐consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, on two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance.
Date: 2004
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/asi.10409
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:55:y:2004:i:7:p:579-592
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().