Negative correlation of word rank sequence in written texts
Takuya Yamamoto,
Syunya Yamada and
Tsuyoshi Mizuguchi ()
Additional contact information
Takuya Yamamoto: Osaka Prefecture University
Syunya Yamada: Osaka Prefecture University
Tsuyoshi Mizuguchi: Osaka Prefecture University
The European Physical Journal B: Condensed Matter and Complex Systems, 2021, vol. 94, issue 10, 1-9
Abstract:
Abstract The structure of written texts is analyzed by focusing on word sequences. As a method, word sequences in texts are transformed into rank sequences of the occurrence frequency of each word and return maps are drawn. The features of word sequences are extracted by comparing with the surrogate data, i.e., a sequence in which all the words are randomly rearranged. A total of 140 written texts consisting of ten languages are selected for analysis. To characterize the distribution in the return map quantitatively, two characteristic quantities are defined, the distance between the original distribution and surrogate distribution, and the correlation coefficient of the adjacent word ranks. The results show that there is a negative correlation in the rank of adjacent words in almost all languages, and features of return maps of the same language texts are similar. A clustering structure which implies the relation to language (sub)family is observed. A mathematical model is proposed for reproducing features of the return map for multiple languages. The numerical simulations achieve results similar to those of the real data quantitatively. GraphicAbstract
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1140/epjb/s10051-021-00210-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:eurphb:v:94:y:2021:i:10:d:10.1140_epjb_s10051-021-00210-y
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/10051
DOI: 10.1140/epjb/s10051-021-00210-y
Access Statistics for this article
The European Physical Journal B: Condensed Matter and Complex Systems is currently edited by P. Hänggi and Angel Rubio
More articles in The European Physical Journal B: Condensed Matter and Complex Systems from Springer, EDP Sciences
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().