Predicting the effectiveness of naïve data fusion on the basis of system characteristics
Kwong Bor Ng and
Paul B Kantor
Journal of the American Society for Information Science, 2000, vol. 51, issue 13, 1177-1189
Abstract:
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in the process called “data fusion.” There are many successful data fusion experiments reported in IR literature, but there are also cases in which it did not work well. Thus, if would be quite valuable to have a theory that can predict, in advance, whether fusion of two or more retrieval schemes will be worth doing. In previous study (Ng & Kantor, 1998), we identified two predictive variables for the effectiveness of fusion: (a) a list‐based measure of output dissimilarity, and (b) a pair‐wise measure of the similarity of performance of the two schemes. In this article we investigate the predictive power of these two variables in simple symmetrical data fusion. We use the IR systems participating in the TREC 4 routing task to train a model that predicts the effectiveness of data fusion, and use the IR systems participating in the TREC 5 routing task to test that model. The model asks, “when will fusion perform better than an oracle who uses the best scheme from each pair?” We explore statistical techniques for fitting the model to the training data and use the receiver operating characteristic curve of signal detection theory to represent the power of the resulting models. The trained prediction methods predict whether fusion will beat an oracle, at levels much higher than could be achieved by chance.
Date: 2000
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://doi.org/10.1002/1097-4571(2000)9999:99993.0.CO;2-E
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:51:y:2000:i:13:p:1177-1189
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().