A comparison study of some Arabic root finding algorithms

Al‐Shawakfa, Emad; Al‐Badarneh, Amer; Shatnawi, Safwan; Al‐Rabab'ah, Khaleel; Bani‐Ismail, Basel

A comparison study of some Arabic root finding algorithms

Emad Al‐Shawakfa, Amer Al‐Badarneh, Safwan Shatnawi, Khaleel Al‐Rabab'ah and Basel Bani‐Ismail

Journal of the American Society for Information Science and Technology, 2010, vol. 61, issue 5, 1015-1024

Abstract: Arabic has a complex structure, which makes it difficult to apply natural language processing (NLP). Much research on Arabic NLP (ANLP) does exist; however, it is not as mature as that of other languages. Finding Arabic roots is an important step toward conducting effective research on most of ANLP applications. The authors have studied and compared six root‐finding algorithms with success rates of over 90%. All algorithms of this study did not use the same testing corpus and/or benchmarking measures. They unified the testing process by implementing their own algorithm descriptions and building a corpus out of 3823 triliteral roots, applying 73 triliteral patterns, and with 18 affixes, producing around 27.6 million words. They tested the algorithms with the generated corpus and have obtained interesting results; they offer to share the corpus freely for benchmarking and ANLP research.

Date: 2010
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.21301

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:61:y:2010:i:5:p:1015-1024

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().