How to Normalize Co-Occurrence Data? An Analysis of Some Well-Known Similarity Measures
Nees Jan van Eck and
Ludo Waltman
ERIM Report Series Research in Management from Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam
Abstract:
In scientometric research, the use of co-occurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this paper, we theoretically analyze the properties of similarity measures for co-occurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that co-occurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research.
Keywords: Jaccard index; association strength; cosine; inclusion index; similarity measure (search for similar items in EconPapers)
JEL-codes: C49 M M11 R4 (search for similar items in EconPapers)
Date: 2009-01-07
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (189)
Downloads: (external link)
https://repub.eur.nl/pub/14528/ERS-2009-001-LIS.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ems:eureri:14528
Access Statistics for this paper
More papers in ERIM Report Series Research in Management from Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam Contact information at EDIRC.
Bibliographic data for series maintained by RePub ( this e-mail address is bad, please contact ).