EconPapers    
Economics at your fingertips  
 

Corpus‐based cross‐language information retrieval in retrieval of highly relevant documents

Tuomas Talvensaari, Martti Juhola, Jorma Laurikkala and Kalervo Järvelin

Journal of the American Society for Information Science and Technology, 2007, vol. 58, issue 3, 322-334

Abstract: Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus‐based cross‐language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish–Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels—liberal, regular, and stringent—were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionary‐based query translation program; the two translation methods were also combined. The results indicate that corpus‐based CLIR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.

Date: 2007
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1002/asi.20495

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:58:y:2007:i:3:p:322-334

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:58:y:2007:i:3:p:322-334