EconPapers    
Economics at your fingertips  
 

A novel approach to the extraction of roots from Arabic words using bigrams

Ismail I. Hmeidi, Riyad F. Al‐Shalabi, Ahmad T. Al‐Taani, Hassan Najadat and Shaker A. Al‐Hazaimeh

Journal of the American Society for Information Science and Technology, 2010, vol. 61, issue 3, 583-591

Abstract: Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the “Manhattan distance,” and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N‐grams with the Dice measure gives better results than using the Manhattan distance measure.

Date: 2010
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.21247

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:61:y:2010:i:3:p:583-591

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:61:y:2010:i:3:p:583-591