EconPapers    
Economics at your fingertips  
 

Indexing and searching strategies for the Russian language

Ljiljana Dolamic and Jacques Savoy

Journal of the American Society for Information Science and Technology, 2009, vol. 60, issue 12, 2540-2547

Abstract: This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language‐independent approach (n‐gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector‐space approaches, namely, the classical tf idf scheme and the dtu‐dtn model. We find that the vector‐space dtu‐dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n‐gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.

Date: 2009
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.21191

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:60:y:2009:i:12:p:2540-2547

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:60:y:2009:i:12:p:2540-2547