Metadata harvesting for content‐based distributed information retrieval
Fabio Simeoni,
Murat Yakici,
Steve Neely and
Fabio Crestani
Journal of the American Society for Information Science and Technology, 2008, vol. 59, issue 1, 12-24
Abstract:
We propose an approach to content‐based Distributed Information Retrieval based on the periodic and incremental centralization of full‐content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative's (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval. As in crawling, some data move toward the retrieval process, but it is statistics about the content rather than content itself; this grants more efficient use of network resources and wider scope of application. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval; this reduces the costs of content provision while promoting the simplicity, effectiveness, and responsiveness of retrieval. Overall, we argue that the approach retains the good properties of centralized retrieval without renouncing to cost‐effective, large‐scale resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure. In particular, we define a minimal extension of the OAI protocol which supports the coordinated harvesting of full‐content indices and descriptive metadata for content resources. Finally, we report on the implementation of a proof‐of‐concept prototype service for multimodel content‐based retrieval of distributed file collections.
Date: 2008
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.20694
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:59:y:2008:i:1:p:12-24
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().