A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces
Marshall A. Taylor,
Dustin S. Stoltz,
Heather Harper,
Sanuj Kumar,
Sumanth Reddy Nandhikonda and
Luke Burks
Additional contact information
Marshall A. Taylor: New Mexico State University
Dustin S. Stoltz: Lehigh University
No sc2ub_v1, SocArXiv from Center for Open Science
Abstract:
Inducing semantic relations in word vector spaces and analyzing how other words or entire documents discursively engage these relations is a popular form of cultural analysis. We propose a reliability metric that is easily interpretable and agnostic to the type of relation. The metric, which we call the anchor reliability coefficient (or relco), is found by creating a synthetic document-term matrix of simulated documents that sequentially shift more of their probability mass from relation-relevant anchor terms to randomly drawn words, and then regressing the documents' similarity to an induced relation by the inverse randomness rank of the documents. We validate the metric at the word-level with both expert- and crowd-sourced dictionaries and at the document-level with expert-annotated social media posts.
Date: 2025-06-27
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/685da4c923e8a5c232a1c814/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:sc2ub_v1
DOI: 10.31219/osf.io/sc2ub_v1
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().