EconPapers    
Economics at your fingertips  
 

The deconstruction of a text: the permanence of the generalized Zipf law—the inter-textual relationship between entropy and effort amount

Thierry Lafouge (), Abdellatif Agouzal and Genevieve Lallich
Additional contact information
Thierry Lafouge: Université de Lyon
Abdellatif Agouzal: Université Lyon 1
Genevieve Lallich: Université de Lyon

Scientometrics, 2015, vol. 104, issue 1, No 9, 193-217

Abstract: Abstract Zipf’s law has intrigued people for a long time. This distribution models a certain type of statistical regularity observed in a text. George K. Zipf showed that, if a word is characterised by its frequency, then, rank and frequency are not independent and approximately verify the relationship: $${\text{Rank }} \times {\text{ frequency}} \approx {\text{constant}}$$ Rank × frequency ≈ constant Various explanations have been advanced to explain this law. In this article, we talk about the Mandelbrot process, which includes two very different approaches. In the first approach, Mandelbrot studies language generation as the transmission of a signal and bases it on information theory, using the entropy concept. In the second, geometric approach, he draws a parallel with the fractal theory, where each word of the text is a sequence of characters framed by two separators, meaning a simple geometric pattern. This leads us to hypothesise that, since the statistical regularities observed have several possible explanations, Zipf’s law carries other patterns. To verify this hypothesis, we chose a text, which we modified and degraded in several successive stages. We called T i the text degraded at step i. We then segmented T i into words. We found that rank and frequency were not independent and approximately verified the relationship: $${\text{Rank}}\,\beta_{i} \, \times {\text{ frequency}} \approx {\text{constant}}\quad \beta_{i} \, > 1$$ Rank β i × frequency ≈ constant β i > 1 The coefficient β i increases with each step i. We call Eq. (1) the generalized Zipf law. We found statistical regularities in the deconstruction of the text. We notably observed a linear relationship between the entropy H i and the amount of effort E i of the various degraded texts T i . To verify our assumptions, we degraded a text of approximately 200 pages. At each step, we calculated various parameters such as entropy, the amount of effort, and the coefficient. We observed an inter-textual relationship between entropy and the amount of effort. This paper therefore provides a proof of this relationship.

Keywords: Zipf’s law; Signal theory; Entropy; Text (search for similar items in EconPapers)
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-015-1600-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:104:y:2015:i:1:d:10.1007_s11192-015-1600-z

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-015-1600-z

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:104:y:2015:i:1:d:10.1007_s11192-015-1600-z