Economics at your fingertips  

Robustness of sentence length measures in written texts

Denner S. Vieira, Sergio Picoli and Renio S. Mendes

Physica A: Statistical Mechanics and its Applications, 2018, vol. 506, issue C, 749-754

Abstract: Hidden structural patterns in written texts have been subject of considerable research in the last decades. In particular, mapping a text into a time series of sentence lengths is a natural way to investigate text structure. Typically, sentence length has been quantified by using measures based on the number of words and the number of characters, but other variations are possible. To quantify the robustness of different sentence length measures, we analyzed a database containing about five hundred books in English. For each book, we extracted six distinct measures of sentence length, including the number of words and number of characters (taking into account lemmatization and stop words removal). We compared these six measures for each book by using (i) Pearson’s coefficient to investigate linear correlations; (ii) Kolmogorov–Smirnov test to compare distributions; and (iii) detrended fluctuation analysis (DFA) to quantify auto–correlations. We have found that all six measures exhibit very similar behavior, suggesting that sentence length is a robust measure related to text structure.

Keywords: Sentence length; Time series; Linear correlation; Probability distribution; Auto-correlation (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations Track citations by RSS feed

Downloads: (external link)
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Access Statistics for this article

Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis

More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Dana Niculescu ().

Page updated 2018-08-04
Handle: RePEc:eee:phsmap:v:506:y:2018:i:c:p:749-754