EconPapers    
Economics at your fingertips  
 

Paragraph-based intra- and inter- document similarity using neural vector paragraph embeddings

Bart Thijs

No 633963, Working Papers of ECOOM - Centre for Research and Development Monitoring from KU Leuven, Faculty of Economics and Business (FEB), ECOOM - Centre for Research and Development Monitoring

Abstract: Science mapping using document networks is based on the assumption that scientific papers are indivisible units with unique links to neighbour documents. Research on proximity in co-citation analysis and the study of lexical properties of sections and citation contexts indicate that this assumption is questionable. Moreover, the meaning of words and co-words depends on the context in which they appear. This study proposes the use of a neural network architecture for word and paragraph embeddings (Doc2Vec) for the measurement of similarity among those smaller units of analysis. It is shown that paragraphs in the ‘Introduction’ and the ‘Discussion’ section are more similar to the abstract, that the similarity among paragraphs is related to -but not linearly- the distance between the paragraphs. The ‘Methodology’ section is least similar to the other sections. Abstracts of citing-cited documents are more similar than random pairs and the context in which a reference appears is most similar to the abstract of the cited document. This novel approach with higher granularity can be used for bibliometric aided retrieval and to assist in measuring interdisciplinarity through the application of network-based centrality measures.

Pages: 14
Date: 2019-02-11
New Economics Papers: this item is included in nep-cmp and nep-sog
Note: paper number MSI_1901
References: Add references at CitEc
Citations:

Published in FEB Research Report MSI_1901

Downloads: (external link)
https://lirias.kuleuven.be/retrieve/531525 Published version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ete:ecoomp:633963

Access Statistics for this paper

More papers in Working Papers of ECOOM - Centre for Research and Development Monitoring from KU Leuven, Faculty of Economics and Business (FEB), ECOOM - Centre for Research and Development Monitoring
Bibliographic data for series maintained by library EBIB ().

 
Page updated 2025-03-19
Handle: RePEc:ete:ecoomp:633963