EconPapers    
Economics at your fingertips  
 

A comprehensive approach to preprocessing data for bibliometric analysis

Marzena Nowakowska ()
Additional contact information
Marzena Nowakowska: Kielce University of Technology

Scientometrics, 2025, vol. 130, issue 9, No 17, 5225 pages

Abstract: Abstract Bibliometric analysis, also known as bibliometrics, has been conducted for several decades to evaluate scientific research based on data available on bibliographic platforms, such as the popular Web of Science or Scopus. Research papers which include bibliometric analysis typically ignore the problem of bibliographic data preprocessing, in particular its important aspect—data cleaning. Discussion of bibliographic data preprocessing in the literature is sparse and scattered; studies usually address selected single components of the entire endeavour. This study aims to fill the gap as a review article, extensively analysing the problem, presenting issues arising from the structure of bibliographic data, combining data from various sources, creating thesauri and conducting bibliometric analyses, also through the author’s own experience. A brief description of the most popular software dedicated to bibliometrics, such as BibExcel, Bibliometrix, CiteSpace, CitNetExplorer, SciMAT, Sci2 Tool, and VOSviewer, is also provided, highlighting the operations available in these applications for the preliminary processing of bibliographic data. The work allows us to draw the following conclusions. The task is more difficult and demanding than some authors suggest or unclearly claim has already been accomplished, without providing additional details. Data cleaning operations are carried out at various stages of preprocessing, sometimes repetitively, and the order in which they are performed may be significant as it determines the success or failure of the process, in particular when combining data from different sources. There is no software which allows automatic execution of the entire preprocessing procedure of bibliographic data. Moreover, manual work is inevitable at various stages of the process. The contribution of this work to the field of bibliometric analysis is expressed in the form of a methodological synthesis, which involves the holistic consideration of the discussed issue, enabling a more comprehensive understanding of it.

Keywords: Bibliometrics; Bibliographic data; Data cleaning; Data join; Disambiguation; Thesaurus (search for similar items in EconPapers)
JEL-codes: Y20 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11192-025-05415-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:130:y:2025:i:9:d:10.1007_s11192-025-05415-x

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-025-05415-x

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-10-24
Handle: RePEc:spr:scient:v:130:y:2025:i:9:d:10.1007_s11192-025-05415-x