EconPapers    
Economics at your fingertips  
 

Measuring document similarity with weighted averages of word embeddings

Bryan Seegmiller, Dimitris Papanikolaou and Lawrence Schmidt

Explorations in Economic History, 2023, vol. 87, issue C

Abstract: We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.

Keywords: Textual analysis for economists; Document similarity; Natural language processing (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0014498322000729
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:exehis:v:87:y:2023:i:c:s0014498322000729

DOI: 10.1016/j.eeh.2022.101494

Access Statistics for this article

Explorations in Economic History is currently edited by R.H. Steckel

More articles in Explorations in Economic History from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-31
Handle: RePEc:eee:exehis:v:87:y:2023:i:c:s0014498322000729