EconPapers    
Economics at your fingertips  
 

Universal and non-universal text statistics: Clustering coefficient for language identification

Diego Espitia and Hernán Larralde

Physica A: Statistical Mechanics and its Applications, 2020, vol. 553, issue C

Abstract: In this work we analyze statistical properties of 91 relatively small texts in 7 different languages (Spanish, English, French, German, Turkish, Russian, Icelandic) as well as texts with randomly inserted spaces. Despite the size (around 11260 different words), the well known universal statistical laws – namely Zipf and Herdan–Heap’s laws – are confirmed, and are in close agreement with results obtained elsewhere. We also construct a word co-occurrence network of each text. While the degree distribution is again universal, we note that the distribution of clustering coefficients, which depend strongly on the local structure of networks, can be used to differentiate between languages, as well as to distinguish natural languages from random texts.

Keywords: Language statistics; Zipf’s law; Heaps’ law; Co-occurrence network; Clustering coefficient; Language identification (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437119321673
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:553:y:2020:i:c:s0378437119321673

DOI: 10.1016/j.physa.2019.123905

Access Statistics for this article

Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis

More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:phsmap:v:553:y:2020:i:c:s0378437119321673