EconPapers    
Economics at your fingertips  
 

Long-Range Memory in Literary Texts: On the Universal Clustering of the Rare Words

Kumiko Tanaka-Ishii and Armin Bunde

PLOS ONE, 2016, vol. 11, issue 11, 1-14

Abstract: A fundamental problem in linguistics is how literary texts can be quantified mathematically. It is well known that the frequency of a (rare) word in a text is roughly inverse proportional to its rank (Zipf’s law). Here we address the complementary question, if also the rhythm of the text, characterized by the arrangement of the rare words in the text, can be quantified mathematically in a similar basic way. To this end, we consider representative classic single-authored texts from England/Ireland, France, Germany, China, and Japan. In each text, we classify each word by its rank. We focus on the rare words with ranks above some threshold Q and study the lengths of the (return) intervals between them. We find that for all texts considered, the probability SQ(r) that the length of an interval exceeds r, follows a perfect Weibull-function, SQ(r) = exp(−b(β)rβ), with β around 0.7. The return intervals themselves are arranged in a long-range correlated self-similar fashion, where the autocorrelation function CQ(s) of the intervals follows a power law, CQ(s) ∼ s−γ, with an exponent γ between 0.14 and 0.48. We show that these features lead to a pronounced clustering of the rare words in the text.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164658 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 64658&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0164658

DOI: 10.1371/journal.pone.0164658

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pone00:0164658