Authority and ranking effects in data fusion
Anselm Spoerri
Journal of the American Society for Information Science and Technology, 2008, vol. 59, issue 3, 450-460
Abstract:
This paper provides empirical support for some of the key assumptions guiding the design of data fusion methods. It computes and analyzes the overlap structures between the search results of retrieval systems that participated in the short, long, and manual tracks in TREC 3, 6, 7, and 8 to examine what can be learned to infer a document's probability of being relevant. This paper shows that the potential relevance of a document increases exponentially as the number of systems retrieving it increases—called the Authority Effect. It also shows that documents higher up in ranked lists and found by more systems are more likely to be relevant—called the Ranking Effect. A contribution of this paper is that it shows that the Authority and Ranking Effects can be observed regardless of whether a query is generated manually or automatically and short or long queries are used. Further, it is illustrated that the Authority and Ranking Effects can be observed if the result sets of random groupings of five retrieval systems are compared and only the top 50 results are used in the overlap computation. Also discussed is how the Authority and Ranking Effects can help explain why major data fusion methods perform well.
Date: 2008
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/asi.20760
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:59:y:2008:i:3:p:450-460
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().