EconPapers    
Economics at your fingertips  
 

Information content in textual data: Revisited for Arabic text

Nadia Hegazi, Nabil Ali and Ehsan Abed

Journal of the American Society for Information Science, 1987, vol. 38, issue 2, 133-137

Abstract: Arabic as opposed to English is a highly redundant language due to its morphological nature. A study was done to measure this redundancy and compare it to its respective values in English. Samples of books, newspapers, and social magazines were used to measure the entropy of the Arabic language using the n‐gram method generated from a moving window of eight characters. Studies of the dependencies of characters on each other was done, as well as a study on the average distribution of word lengths. The results obtained indicated the ability of Arabic to be more compressible than English, and that of course is due to its morphological nature. The average length of Arabic words was found to be longer than English words due to the fact that Arabic words contain morphological extensions. © 1987 John Wiley & Sons, Inc.

Date: 1987
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(198703)38:23.0.CO;2-P

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:38:y:1987:i:2:p:133-137

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:38:y:1987:i:2:p:133-137