EconPapers    
Economics at your fingertips  
 

Modeling texts with networks: comparing five approaches to sentence representation

Davi Alves Oliveira () and Hernane Borges de Barros Pereira ()
Additional contact information
Davi Alves Oliveira: University of Bahia State (UNEB)
Hernane Borges de Barros Pereira: University of Bahia State (UNEB)

The European Physical Journal B: Condensed Matter and Complex Systems, 2024, vol. 97, issue 6, 1-12

Abstract: Abstract Complex networks offer a powerful framework for modeling linguistic phenomena. This study compares five distinct methods for representing sentences as networks, each with unique edge definitions: (1) a lines approach, where edges represent token (e.g., word) adjacency; (2) a close-range co-occurrence approach, where edges are based on the probability of tokens co-occurring at distance one or two; (3) a cliques approach, where edges connect tokens co-occurring within the same sentence; (4) a dependency-based approach, where edges are defined by syntactic dependencies extracted by a parser; (5) an IF-trimmed-subgraphs approach, where edges are determined by the Incidence-Fidelity (IF) Index. While the first four approaches are well established in the literature, the last one is a novel proposal. We also examined the effects of limiting the vertices to lemmas (i.e., words with inflections removed) and to lexical lemmas (i.e., nouns, adjectives, verbs, and adverbs) as opposed to the unaltered words. Our results reveal that these approaches yield networks with varying average minimal path lengths and degrees, influencing the interpretation of results. While small-world behavior remains consistent across networks, scale-free behavior analysis is affected. Notably, excluding functional words significantly alters degree distributions. We suggest, in order of relevance and according to the resources available, the dependency-based, the close-range co-occurrence, and the lines approaches for cases in which syntactic relations are central, and the IF-trimmed-subgraphs and the cliques approaches for cases in which semantic relations are central. Graphical Abstract Representation of the sentence “we calculated two sets of adjusted values as follows” using five approaches - (1) the lines approach, (2) the close-range cooccurrence approach, (3) the cliques approach, (4) the dependency-based approach, and (5) the IF-trimmed-subgraphs approach - and three vertex definitions - (1) vertices representing unaltered words, (2) vertices representing lemmas, and (3) vertices representing lexical lemmas

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1140/epjb/s10051-024-00717-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:eurphb:v:97:y:2024:i:6:d:10.1140_epjb_s10051-024-00717-0

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/10051

DOI: 10.1140/epjb/s10051-024-00717-0

Access Statistics for this article

The European Physical Journal B: Condensed Matter and Complex Systems is currently edited by P. Hänggi and Angel Rubio

More articles in The European Physical Journal B: Condensed Matter and Complex Systems from Springer, EDP Sciences
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:eurphb:v:97:y:2024:i:6:d:10.1140_epjb_s10051-024-00717-0