EconPapers    
Economics at your fingertips  
 

Using word embedding to detect keywords in texts modeled as complex networks

Jorge A. V. Tohalino, Thiago Silva () and Diego R. Amancio ()
Additional contact information
Jorge A. V. Tohalino: University of São Paulo
Diego R. Amancio: University of São Paulo

Scientometrics, 2024, vol. 129, issue 7, No 1, 3599-3623

Abstract: Abstract Detecting keywords in texts is a task of paramount importance for many text mining applications. Graph-based techniques have been commonly used to automatically find the key concepts in texts. However, the integration of valuable information provided by embeddings to enrich the graph structure has not been widely used. In this context, this paper aims to address the following question: can the quality of extracted keywords from a co-occurrence network be enhanced by integrating embeddings to enrich the network structure? In the adopted model, texts are represented as co-occurrence networks, where nodes are words and edges are established either by contextual or semantical similarity. Two embedding approaches were used: Word2vec and Bidirectional Encoder Representations from Transformers (BERT). The results indicate that using virtual edges can effectively enhance the discriminative capacity of co-occurrence networks. The best performance was achieved by incorporating a limited proportion of virtual (embedding) edges. A comparison of the structural and dynamical network metrics demonstrated that the degree, PageRank, and accessibility metrics exhibited superior performance in the proposed model.

Keywords: Complex networks; Natural language processing; Word embedding; BERT; Word2Vec (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11192-024-05055-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:7:d:10.1007_s11192-024-05055-7

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-024-05055-7

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-16
Handle: RePEc:spr:scient:v:129:y:2024:i:7:d:10.1007_s11192-024-05055-7