EconPapers    
Economics at your fingertips  
 

An analysis of the relative hardness of Reuters‐21578 subsets

Franca Debole and Fabrizio Sebastiani

Journal of the American Society for Information Science and Technology, 2005, vol. 56, issue 6, 584-596

Abstract: The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters‐21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have “carved” different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters‐21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters‐21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets.

Date: 2005
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1002/asi.20147

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:56:y:2005:i:6:p:584-596

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:56:y:2005:i:6:p:584-596