An open-source tool for merging data from multiple citation databases
Dušan Nikolić (),
Dragan Ivanović () and
Lidija Ivanović ()
Additional contact information
Dušan Nikolić: University of Novi Sad
Dragan Ivanović: University of Novi Sad
Lidija Ivanović: University of Novi Sad
Scientometrics, 2024, vol. 129, issue 7, No 37, 4573-4595
Abstract:
Abstract A bibliometric analysis based on records from a single citation database may be limited in its comprehensiveness and, therefore, in the reliability of its results. The process of combining and deduplicating records from multiple citation index databases for the purpose of a bibliometric analysis is often manual and requires significant effort, especially for larger amounts of data. This paper presents an open-source tool for automatically preprocessing and deduplicating records based on similarity and user-configurable strategies. To validate the capabilities of the tool, the authors of this paper first manually deduplicated records from Scopus and Web of Science on a use-case analysis for 11,307 records. The performance of the tool was then evaluated against the manually deduplicated results. From the results of the best performing similarity configuration on a deduplication use case, the tool minimizes the time researchers would spend on data wrangling for combining Scopus and WoS up to 99% precision and 98% F-measure. The tool developed has practical implications for bibliometric studies. For instance, we conducted a bibliometric analysis of the most productive researchers at a university using a single citation database, as well as merged data from multiple citation databases. The study used the VOSviewer tool and showed that utilizing merged data may produce different outcomes compared to those obtained from a study based on a single citation database.
Keywords: Bibliometric analysis; Combining databases; Web of Science; Scopus; Software tool; TeslaSCIToolkit (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-024-05076-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:7:d:10.1007_s11192-024-05076-2
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-024-05076-2
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().