EconPapers    
Economics at your fingertips  
 

Topic modeling for mediated access to very large document collections

Gheorghe Muresan and David J. Harper

Journal of the American Society for Information Science and Technology, 2004, vol. 55, issue 10, 892-910

Abstract: Clear and precise queries are a necessity when searching very large document collections, especially when query‐based retrieval is the only means of exploration. We propose system‐mediated information access as a solution for users' well‐documented inability to formulate good queries. Our approach is based on two main assumptions: first, on the ability of document clustering to reveal the topical, semantic structure of a problem domain represented by a specialized “source collection,” and, second, on the capacity of statistical language models to convey content. Taking the role of the human mediator or intermediary searcher, a mediation system interacts with the user and supports her exploration of a relatively small source collection, chosen to be representative for the problem domain. Based on the user's selection of relevant “exemplary” documents and clusters from this source collection, the system builds a language model of her information need. This model is subsequently used to derive “mediated queries,” which are expected to convey precisely and comprehensively the user's information need, and can be submitted by the user to search any large and heterogeneous “target collections.” We present results of experiments that simulated various mediation strategies and compared the effect on mediation effectiveness of a variety of parameters, such as the similarity measure, the weighting scheme, and the clustering method. They provide both upperbounds of performance that can potentially be reached by real end users and a comparison between the effectiveness of these strategies. The experimental evidence suggests that information retrieval mediated through a clustered specialized collection has potential to improve effectiveness significantly.

Date: 2004
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20034

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:55:y:2004:i:10:p:892-910

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:55:y:2004:i:10:p:892-910