Web‐based text classification in the absence of manually labeled training documents
Chen‐Ming Hung and
Lee‐Feng Chien
Journal of the American Society for Information Science and Technology, 2007, vol. 58, issue 1, 88-96
Abstract:
Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self‐learned approach to extract high‐quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user‐defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters‐21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.
Date: 2007
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.20442
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:58:y:2007:i:1:p:88-96
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().