Document–document similarity approaches and science mapping: Experimental comparison of five approaches
Per Ahlgren and
Cristian Colliander
Journal of Informetrics, 2009, vol. 3, issue 1, 49-63
Abstract:
This paper treats document–document similarity approaches in the context of science mapping. Five approaches, involving nine methods, are compared experimentally. We compare text-based approaches, the citation-based bibliographic coupling approach, and approaches that combine text-based approaches and bibliographic coupling. Forty-three articles, published in the journal Information Retrieval, are used as test documents. We investigate how well the approaches agree with a ground truth subject classification of the test documents, when the complete linkage method is used, and under two types of similarities, first-order and second-order. The results show that it is possible to achieve a very good approximation of the classification by means of automatic grouping of articles. One text-only method and one combination method, under second-order similarities in both cases, give rise to cluster solutions that to a large extent agree with the classification.
Keywords: Citation data; Textual data; Data source combination; Cluster analysis; Science mapping (search for similar items in EconPapers)
Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (23)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S1751157708000680
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:infome:v:3:y:2009:i:1:p:49-63
DOI: 10.1016/j.joi.2008.11.003
Access Statistics for this article
Journal of Informetrics is currently edited by Leo Egghe
More articles in Journal of Informetrics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().