EconPapers    
Economics at your fingertips  
 

Normalization of direct citations for clustering in publication-level networks: evaluation of six approaches

Peter Sjögårde () and Per Ahlgren ()
Additional contact information
Peter Sjögårde: Karolinska Institutet
Per Ahlgren: Uppsala University

Scientometrics, 2024, vol. 129, issue 3, No 32, 1949-1968

Abstract: Abstract Clustering of publication networks is an efficient way to obtain classifications of large collections of research publications. Such classifications can be used to, e.g., detect research topics, normalize citation relations, or explore the publication output of a unit. Citation networks can be created using a variety of approaches. Best practices to obtain classifications using clustering have been investigated, in particular the performance of different publication-publication relatedness measures. However, evaluation of different approaches to normalization of citation relations have not been explored to the same extent. In this paper, we evaluate five approaches to normalization of direct citation relations with respect to clustering solution quality in four data sets. A sixth approach is evaluated using no normalization. To assess the quality of clustering solutions, we use three measures. (1) We compare the clustering solution to the reference lists of a set of publications using the Adjusted Rand Index. (2) Using the Silhouette width measure, we quantity to which extent the publications have relations to other clusters than the one they have been assigned to. (3) We propose a measure that captures publications that have probably been inaccurately assigned. The results clearly show that normalization is preferred over unnormalized direct citation relations. Furthermore, the results indicate that the fractional normalization approach, which can be considered the standard approach, causes specific inaccurate assignments, which we intend to address in this paper. The geometric normalization approach has a similar performance as the fractional approach regarding Adjusted Rand Index and Silhouette width but leads to fewer inaccurate assignments at moderate to high granularity levels. We therefore believe that the geometric approach may be preferred over the fractional approach.

Keywords: Clustering; Citation networks; Algorithmic classification; Publication-level classification; Citation Normalization (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11192-024-04932-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:3:d:10.1007_s11192-024-04932-5

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-024-04932-5

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:129:y:2024:i:3:d:10.1007_s11192-024-04932-5