Missing institutions in OpenAlex: possible reasons, implications, and solutions
Lin Zhang (),
Zhe Cao,
Yuanyuan Shang,
Gunnar Sivertsen and
Ying Huang
Additional contact information
Lin Zhang: Wuhan University
Zhe Cao: Wuhan University
Yuanyuan Shang: Chinese Academy of Social Sciences Evaluation Studies
Gunnar Sivertsen: Nordic Institute for Studies in Innovation, Research and Education (NIFU)
Ying Huang: Wuhan University
Scientometrics, 2024, vol. 129, issue 10, No 4, 5869-5891
Abstract:
Abstract The advent of open science calls for open data platforms with high data quality. As a fully open catalog of the global research system launched in January 2022, OpenAlex features two main advantages of easy data accessibility and broad data coverage, which has been widely used in quantitative science studies. Remarkably, OpenAlex is adopted as an important data source for Leiden university ranking. However, there is a severe data quality problem of missing institutions in journal article metadata in OpenAlex. This study investigates the possible reasons for the problem and its consequences and solutions by defining three types of institutional information—full institutional information (FII), partially missing institutional information (PMII) and completely missing institutional information (CMII). Our results show that the problem of missing institutions occurs in more than 60% of the journal articles in OpenAlex. The problem is particularly widespread in metadata from the early years and in the social sciences and humanities. Using sub-samples of the data, we further explore the possible reasons for the problem, the risk it might represent for distorted results, and possible solutions to the problem of missing institutions. The aim is to raise the importance of data quality improvements in open resources, and thus to support the responsible use of open resources in quantitative science studies and also in broader contexts.
Keywords: OpenAlex; Missing institutional information; Open science; Data quality (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-023-04923-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:10:d:10.1007_s11192-023-04923-y
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-023-04923-y
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().