Information content in textual data: Revisited for Arabic text
Nadia Hegazi,
Nabil Ali and
Ehsan Abed
Journal of the American Society for Information Science, 1987, vol. 38, issue 2, 133-137
Abstract:
Arabic as opposed to English is a highly redundant language due to its morphological nature. A study was done to measure this redundancy and compare it to its respective values in English. Samples of books, newspapers, and social magazines were used to measure the entropy of the Arabic language using the n‐gram method generated from a moving window of eight characters. Studies of the dependencies of characters on each other was done, as well as a study on the average distribution of word lengths. The results obtained indicated the ability of Arabic to be more compressible than English, and that of course is due to its morphological nature. The average length of Arabic words was found to be longer than English words due to the fact that Arabic words contain morphological extensions. © 1987 John Wiley & Sons, Inc.
Date: 1987
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(198703)38:23.0.CO;2-P
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:38:y:1987:i:2:p:133-137
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().