EconPapers    
Economics at your fingertips  
 

On measurement of distances between texts in dictionary-based content analysis

Anton Oleinik ()
Additional contact information
Anton Oleinik: Memorial University of Newfoundland

Quality & Quantity: International Journal of Methodology, 2025, vol. 59, issue 1, No 6, 125-145

Abstract: Abstract The article discusses the measurement of distances between heterogeneous texts. Some limitations of WordStat, a popular off-the-shelf software package for content analysis, when measuring distances between texts, are identified and investigated with the help of an experiment. A corpus of texts (c. 4 million words) composed of political leaders’ speeches and news items about Russia’s invasion of Ukraine in three languages was analyzed twice, using WordStat and an algorithm with explicitly set parameters. The same custom-built dictionary was used in both cases. A larger corpus of texts (c. 16 million words) was also analyzed using an extended version of the dictionary and the proposed metrics, Sigma (the standard deviation of observed frequencies from expected frequencies) and Cohen’s d. Some remedies are discussed, including the additional processing of output generated by WordStat and adding Sigma to the list of (dis)similarity measures.

Keywords: Content analysis; WordStat; Off-the-shelf programs; Distance measures; Heterogeneous corpora; Multi-language lexicons (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11135-024-01933-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:qualqt:v:59:y:2025:i:1:d:10.1007_s11135-024-01933-7

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11135

DOI: 10.1007/s11135-024-01933-7

Access Statistics for this article

Quality & Quantity: International Journal of Methodology is currently edited by Vittorio Capecchi

More articles in Quality & Quantity: International Journal of Methodology from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-02
Handle: RePEc:spr:qualqt:v:59:y:2025:i:1:d:10.1007_s11135-024-01933-7