EconPapers    
Economics at your fingertips  
 

A data science-based framework to categorize academic journals

Zahid Halim () and Shafaq Khan ()
Additional contact information
Zahid Halim: Ghulam Ishaq Khan Institute of Engineering Sciences and Technology
Shafaq Khan: University of Management and Technology

Scientometrics, 2019, vol. 119, issue 1, No 19, 393-423

Abstract: Abstract Academic journals play a significant role in the dissemination of new research insights and knowledge among scientists. The number of such journals has recently increased significantly. Scientists prefer to publish their scholarly work at reputed venues. Speed of publication is also an import factor considered by many while selecting a publication venue. To evaluate a journal’s quality, few of the key indicators include impact factor, Source Normalized Impact per Paper (SNIP), and Hirsch index (h-index). Journals’ ranking is an indication of their impact and quality with respect to other venues in a specific discipline. Various measures can be utilized for ranking, like, field specific statistics, intra discipline ranking, or a combination of both. Earlier, the journals’ ranking was done through a manual process by providing an institutional list created by academic leaders. Factors like politicization, biases, and personal interests were the key issues with such categorization. Later, the process evolved to a database system based on impact factor, SNIP (Source Normalized Impact per Paper), h-index, or any combination of these. All this demanded an external source of categorizing academic journals. This work presents a data science-based framework that evaluates journals based on their key bibliometric indicators and presents an automated approach to categorize them. For this, the current proposal is restricted to the journals published in the computer science domain. The journal’s features considered in the proposed framework include: publisher, impact factor, website, CiteScore, SJR (SCImago Journal & Country Rank), SNIP, h-index, country, age, cited half-life, immediacy factor/index, Eigenfactor score, article influence score, open access, percentile, citations, acceptance rate, peer review, and the number of articles published yearly. A dataset is collected for 660 journals consisting of these 19 features. The dataset is preprocessed to fill-in the missing values and perform scaling. Three feature selection techniques, namely, Mutual Information (MI), minimum Redundancy Maximum Relevance (mRMR), and Statistical Dependency (SD) are used to rank the aforementioned features. The dataset is then vertically divided into three sets, all features, top nine features, and bottom ten features. Later, two clustering techniques, namely, k-means and k-medoids are employed to find the optimum number of coherent groups in the dataset. Based on a rigorous evaluation, four groups of journals are identified. It is followed by training two classifiers, i.e., k-NN (Nearest Neighbor) and Artificial Neural Network (ANN) to predict the category of an unknown journal. Where, the ANN shows an average accuracy of 82.85%. A descriptive analysis of the clusters formed is also presented to gain insights about the four journal categories. The proposed framework provides an opportunity to independently categorize academic journals based on data science methods using multiple significant bibliometric indicators.

Keywords: Journals categorization; Ranking; Data science; Clustering application (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (9)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-019-03035-w Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:119:y:2019:i:1:d:10.1007_s11192-019-03035-w

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-019-03035-w

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:119:y:2019:i:1:d:10.1007_s11192-019-03035-w