EconPapers    
Economics at your fingertips  
 

A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces

Marshall A. Taylor, Dustin S. Stoltz, Heather Harper, Sanuj Kumar, Sumanth Reddy Nandhikonda and Luke Burks
Additional contact information
Marshall A. Taylor: New Mexico State University
Dustin S. Stoltz: Lehigh University

No sc2ub_v1, SocArXiv from Center for Open Science

Abstract: Inducing semantic relations in word vector spaces and analyzing how other words or entire documents discursively engage these relations is a popular form of cultural analysis. We propose a reliability metric that is easily interpretable and agnostic to the type of relation. The metric, which we call the anchor reliability coefficient (or relco), is found by creating a synthetic document-term matrix of simulated documents that sequentially shift more of their probability mass from relation-relevant anchor terms to randomly drawn words, and then regressing the documents' similarity to an induced relation by the inverse randomness rank of the documents. We validate the metric at the word-level with both expert- and crowd-sourced dictionaries and at the document-level with expert-annotated social media posts.

Date: 2025-06-27
References: Add references at CitEc
Citations:

Downloads: (external link)
https://osf.io/download/685da4c923e8a5c232a1c814/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:sc2ub_v1

DOI: 10.31219/osf.io/sc2ub_v1

Access Statistics for this paper

More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-06-28
Handle: RePEc:osf:socarx:sc2ub_v1