Method for evaluation of stemming algorithms based on error counting
Chris D. Paice
Journal of the American Society for Information Science, 1996, vol. 47, issue 8, 632-649
Abstract:
In most previous studies, the effectiveness of stemming algorithms has been compared by determining the retrieval performance for various experimental test collections. The present work assesses performance by counting the number of identifiable errors during the stemming of words from various text samples. This entails manual grouping of the words in each sample; software has been developed to facilitate this. After grouping, the words are stemmed and indices are then computed which represent the rate of understemming and overstemming. Results are presented for three stemmers (Lovins, Porter, and Paice/Husk), in each case using three distinct text samples. Although the results are not entirely clear cut, it appears that the Lovins stemmer is inferior to the other two in terms of general accuracy. The way in which the indices vary with the size of the text sample is also investigated. © 1996 John Wiley & Sons, Inc.
Date: 1996
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(199608)47:83.0.CO;2-U
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:47:y:1996:i:8:p:632-649
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().