EconPapers    
Economics at your fingertips  
 

Using N‐grams for Arabic text searching

Suleiman H. Mustafa and Qasem A. Al‐Radaideh

Journal of the American Society for Information Science and Technology, 2004, vol. 55, issue 11, 1002-1007

Abstract: N‐grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the N‐gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the N‐gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct N‐gram matching.

Date: 2004
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20051

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:55:y:2004:i:11:p:1002-1007

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:55:y:2004:i:11:p:1002-1007