EconPapers    
Economics at your fingertips  
 

How much data is sufficient for reliable bibliometric domain analysis? A multi-scenario experimental approach

Guo Chen, Shuya Chen, Zhili Chen, Lu Xiao () and Jiming Hu ()
Additional contact information
Guo Chen: Nanjing University of Science and Technology
Shuya Chen: Nanjing University of Science and Technology
Zhili Chen: Nanjing University of Science and Technology
Lu Xiao: Nanjing University of Finance and Economics
Jiming Hu: Wuhan University

Scientometrics, 2025, vol. 130, issue 5, No 17, 2923-2946

Abstract: Abstract Determining the adequate data size for bibliometric domain analysis is a crucial yet unresolved issue in bibliometric research. In this paper, we propose a systematic approach to address this challenge by considering multiple task scenarios and conducting sampling experiments on five domains. We introduce two indexes to quantitatively evaluate the reliability of sub-bibliographic datasets with different sample sizes in fitting the complete bibliographic datasets, focusing on the impact of scale on dataset completeness. We find that while larger datasets tend to yield better results, diminishing returns are observed as the dataset size increases due to higher costs and time investments. Specific analysis tasks, such as subject category and country analysis (including co-occurrence relationships), can be conducted with smaller dataset sizes. However, analyzing authors and their co-occurrence relationships necessitates a larger dataset size. Nevertheless, different analysis scenarios require varying dataset sizes, especially when considering result ranking, co-occurrence relationship analysis, and top high-frequency elements. We also find that the appropriate dataset scale for analyzing different elements depends on their power-law distribution in the bibliographic dataset. Our findings offer practical guidance for researchers in selecting the appropriate dataset size for their specific analysis tasks, taking into account factors such as domain size, analyzed objects, the number of top values to be analyzed, and result ranking requirements.

Keywords: Bibliometric analysis; Domain analysis; Dataset completeness; Dataset size; Sampling experiments (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11192-025-05335-w Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:130:y:2025:i:5:d:10.1007_s11192-025-05335-w

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-025-05335-w

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-06-03
Handle: RePEc:spr:scient:v:130:y:2025:i:5:d:10.1007_s11192-025-05335-w