EconPapers    
Economics at your fingertips  
 

Missing institutions in OpenAlex: possible reasons, implications, and solutions

Lin Zhang (), Zhe Cao, Yuanyuan Shang, Gunnar Sivertsen and Ying Huang
Additional contact information
Lin Zhang: Wuhan University
Zhe Cao: Wuhan University
Yuanyuan Shang: Chinese Academy of Social Sciences Evaluation Studies
Gunnar Sivertsen: Nordic Institute for Studies in Innovation, Research and Education (NIFU)
Ying Huang: Wuhan University

Scientometrics, 2024, vol. 129, issue 10, No 4, 5869-5891

Abstract: Abstract The advent of open science calls for open data platforms with high data quality. As a fully open catalog of the global research system launched in January 2022, OpenAlex features two main advantages of easy data accessibility and broad data coverage, which has been widely used in quantitative science studies. Remarkably, OpenAlex is adopted as an important data source for Leiden university ranking. However, there is a severe data quality problem of missing institutions in journal article metadata in OpenAlex. This study investigates the possible reasons for the problem and its consequences and solutions by defining three types of institutional information—full institutional information (FII), partially missing institutional information (PMII) and completely missing institutional information (CMII). Our results show that the problem of missing institutions occurs in more than 60% of the journal articles in OpenAlex. The problem is particularly widespread in metadata from the early years and in the social sciences and humanities. Using sub-samples of the data, we further explore the possible reasons for the problem, the risk it might represent for distorted results, and possible solutions to the problem of missing institutions. The aim is to raise the importance of data quality improvements in open resources, and thus to support the responsible use of open resources in quantitative science studies and also in broader contexts.

Keywords: OpenAlex; Missing institutional information; Open science; Data quality (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11192-023-04923-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:10:d:10.1007_s11192-023-04923-y

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-023-04923-y

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:129:y:2024:i:10:d:10.1007_s11192-023-04923-y