EconPapers    
Economics at your fingertips  
 

A Language and Its Holes: The First-Order Homology of the Large-Scale Geometrical Structure of a Natural Language

Vasilii A. Gromov, Quynh Nhu Dang and Asel S. Erbolova

Complexity, 2025, vol. 2025, 1-15

Abstract: The present paper employs topological data analysis methods to reveal ‘holes’ (stable persistent homologies) in the semantic spaces of words, bigrams, and trigrams of the English and Russian languages, and to ascertain their boundaries. Furthermore, the paper selects those holes that belong to the large-scale (coarse-grained) structure of the language that are not just local inhomogeneities of the sample—it appears that there are around a dozen of them for each of the languages (English and Russian). These boundaries delineate ‘blind spots’ of the respective language—the regions of the semantic spaces that do not contain words/bigrams/trigrams of the language—that is, regions of concepts that the language cannot see through its lens. The secondary goal of the paper is to solve the bot-detection problem in its strong statement, that is, one trains the classifiers on one set of bots and tests on the another set of bots. To this end, we estimate the average distances from words, bigrams, and trigrams of a text to the boundaries of the nearest ‘hole’, for texts both written by humans and generated by bots, and construct classifiers. The classifiers show comparatively good results: the average accuracy amounts to 0.8.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://downloads.hindawi.com/journals/complexity/2025/9659172.pdf (application/pdf)
http://downloads.hindawi.com/journals/complexity/2025/9659172.xml (application/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:complx:9659172

DOI: 10.1155/cplx/9659172

Access Statistics for this article

More articles in Complexity from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().

 
Page updated 2025-11-17
Handle: RePEc:hin:complx:9659172