Document retrieval experiments using cluster analysis
Jack Minker,
Eero Peltola and
Gerald A. Wilson
Journal of the American Society for Information Science, 1973, vol. 24, issue 4, 246-260
Abstract:
The objectives of this paper are to describe the effect of using weighted index terms in a document retrieval system, and to evaluate retrieval performance when queries are expanded by terms occurring in clusters with the query terms. Three data collections, each indexed by several methods, two of which were studied and reported on in previous work, are used to develop explicit results. The study both expands upon and extends previous work at the University of Maryland. The effect of weighting index terms in the document collection, the queries and the formation of clusters is analyzed. Eight cases are investigated in which index terms are weighted and unweighted. The best results are obtained when weighted index terms are used in forming clusters, in queries, and in documents. In this case, the results on the new collection demonstrate a significant improvement in retrieval performance relative to the performance with the unmodified data base, when clustered terms are added to queries. The improvement is in contrast to the results in the previous study, where a degradation in performance, or at best an insignificant improvement, was obtained. Comparisons are made to related work by Sparck‐Jones and her colleagues. This study tends to support the conclusion of Sparck‐Jones that weighted index terms provide better retrieval performance than unweighted terms. The cluster addition of index terms to queries yields unpredictable results. Some collections show an improvement in retrieval performance, others a degradation or no change in performance. Sparck‐Jones obtained an improvement in retrieval performance for her document collection. We conclude that the results are highly dependent upon the document collection, and the technique should be employed with caution.
Date: 1973
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.4630240404
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:24:y:1973:i:4:p:246-260
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().