EconPapers    
Economics at your fingertips  
 

Representing documents using an explicit model of their similarities

Brian T. Bartell, Garrison W. Cottrell and Richard K. Belew

Journal of the American Society for Information Science, 1995, vol. 46, issue 4, 254-271

Abstract: A method is proposed for creating vector space representations of documents based on modeling target interdocument similarity values. The target similarity values are assumed to capture semantic relationships, or associations, between the documents. The vector representations are chosen so that the inner product similarities between document vector pairs closely match their target interdocument similarities. The method is closely related to the Latent Semantic Indexing approach; in fact, they are equivalent when the target similarities are derived directly from document similarities based on term co‐occurrence. However, our method allows for external sources of interdocument semantic constraints to be used in the indexing, though at greater computational expense. The method is applied to three standard text databases from the information retrieval literature. On the CISI database of information science abstracts, performance (measured by precision averaged over a range of recall levels) improves by 28% compared to a weighted term‐vector approach, and improves 10% compared to Latent Semantic Indexing. Similar improvement is obtained on the Cranfield database, but no improvement is obtained for the artificial MED database of medical abstracts. The generally favorable performance suggests interesting potential for methods which explicitly modify the retrieval system to meet interdocument semantic constraints. © 1995 John Wiley & Sons, Inc.

Date: 1995
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(199505)46:43.0.CO;2-S

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:46:y:1995:i:4:p:254-271

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:46:y:1995:i:4:p:254-271