EconPapers    
Economics at your fingertips  
 

TOWARDS THE QUANTIFICATION OF THE SEMANTIC INFORMATION ENCODED IN WRITTEN LANGUAGE

Marcelo A. Montemurro () and Damián H. Zanette ()
Additional contact information
Marcelo A. Montemurro: Faculty of Life Sciences, The University of Manchester, M13 9PT, Manchester, United Kingdom
Damián H. Zanette: Consejo Nacional de Investigaciones Científicas y Técnicas, Centro Atómico Bariloche and Instituto Balseiro, 8400 San Carlos de Bariloche, Río Negro, Argentina

Advances in Complex Systems (ACS), 2010, vol. 13, issue 02, 135-153

Abstract: Written language is a complex communication signal capable of conveying information encoded in the form of ordered sequences of words. Beyond the local order ruled by grammar, semantic and thematic structures affect long-range patterns in word usage. Here, we show that a direct application of information theory quantifies the relationship between the statistical distribution of words and the semantic content of the text. We show that there is a characteristic scale, roughly around a few thousand words, which establishes the typical size of the most informative segments in written language. Moreover, we find that the words whose contributions to the overall information is larger, are the ones more closely associated with the main subjects and topics of the text. This scenario can be explained by a model of word usage that assumes that words are distributed along the text in domains of a characteristic size where their frequency is higher than elsewhere. Our conclusions are based on the analysis of a large database of written language, diverse in subjects and styles, and thus are likely to be applicable to general language sequences encoding complex information.

Keywords: Natural language; information theory; complex communication (search for similar items in EconPapers)
Date: 2010
References: View complete reference list from CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219525910002530
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:acsxxx:v:13:y:2010:i:02:n:s0219525910002530

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219525910002530

Access Statistics for this article

Advances in Complex Systems (ACS) is currently edited by Frank Schweitzer

More articles in Advances in Complex Systems (ACS) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:acsxxx:v:13:y:2010:i:02:n:s0219525910002530