Knowing what you get when seeking semantic similarity: exploring classic NLP method biases
Johanne Saint-Charles,
Pierre Mongeau and
Louis Renaud-Desjardins
Chapter 3 in Handbook of Social Computing, 2024, pp 27-46 from Edward Elgar Publishing
Abstract:
Various Natural Language Processing (NLP) methods are called upon to establish similarity between texts in the context of socio-semantic studies. This chapter addresses the methodological diversity in the field by asking to what extent classical NLP methods converge in their identification of similarity between various texts. We compare the results of well-known (and often used) NLP methods in social sciences and humanities: Jaccard, LDA, LSA and TF–IDF, on corpora with different characteristics. Results show that these methods have specific bias and cannot be substituted for one another. Our observations invite social sciences and humanities scholars to consider new criteria for the selection of an NLP method suited to their research objectives.
Keywords: Business and Management; Innovations and Technology; Sociology and Social Policy (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.elgaronline.com/doi/10.4337/9781803921259.00009 (application/pdf)
Our link check indicates that this URL is bad, the error code is: 503 Service Temporarily Unavailable
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:elg:eechap:21469_3
Ordering information: This item can be ordered from
http://www.e-elgar.com
Access Statistics for this chapter
More chapters in Chapters from Edward Elgar Publishing
Bibliographic data for series maintained by Darrel McCalla ().