EconPapers    
Economics at your fingertips  
 

Stylometry and Numerals Usage: Benford’s Law and Beyond

Andrei V. Zenkov
Additional contact information
Andrei V. Zenkov: Department of Modelling of Controllable Systems, Ural Federal University, 620002 Ekaterinburg, Russia

Stats, 2021, vol. 4, issue 4, 1-18

Abstract: We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.

Keywords: Benford’s Law; first significant digit; leading digit; numerals in texts; quantitative linguistics; stylometry; attribution of texts; text authorship (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2571-905X/4/4/60/pdf (application/pdf)
https://www.mdpi.com/2571-905X/4/4/60/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:4:y:2021:i:4:p:60-1068:d:702592

Access Statistics for this article

Stats is currently edited by Mrs. Minnie Li

More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jstats:v:4:y:2021:i:4:p:60-1068:d:702592