On the complexity of Rocchio's similarity‐based relevance feedback algorithm
Zhixiang Chen and
Bin Fu
Journal of the American Society for Information Science and Technology, 2007, vol. 58, issue 10, 1392-1400
Abstract:
Rocchio's similarity‐based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d2(log d + log n)) over the discretized vector space {0,…, n − 1}d, when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier $\left( {\overrightarrow q ,0} \right)$ over {0,…, n − 1}d can be improved to, at most, 1 + 2k (n − 1) (log d − log(n − 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound $\Omega \left( {\left( {_2^d } \right){\rm{log}}\,n} \right)$ on its learning complexity over the Boolean vector space {0, 1}d.
Date: 2007
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/asi.20612
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:58:y:2007:i:10:p:1392-1400
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().