Web‐based text classification in the absence of manually labeled training documents

Hung, Chen‐Ming; Chien, Lee‐Feng

Web‐based text classification in the absence of manually labeled training documents

Chen‐Ming Hung and Lee‐Feng Chien

Journal of the American Society for Information Science and Technology, 2007, vol. 58, issue 1, 88-96

Abstract: Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self‐learned approach to extract high‐quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user‐defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters‐21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.

Date: 2007
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20442

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:58:y:2007:i:1:p:88-96

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().