An automatic and association-based procedure for hierarchical publication subject categorization
Cristina Urdiales and
Eduardo Guzmán
Journal of Informetrics, 2024, vol. 18, issue 1
Abstract:
Subject categorization of scientific publications, i.e., journals, book series or conference proceedings, has become a main concern in academia, as publication impact and ranking are considered a basic criterion to evaluate paper quality. Publishers usually propose their own categorization, but they often include only their own publications and their categories might not be coherent with other proposals. Also, due to the dynamic nature of science, new categories may frequently appear. As traditional mechanisms for categorization have been questioned by many authors, a new research line has emerged to improve the category assignment process. Approaches usually rely on assessing publication similarity in terms of topics, co-citation, editorial boards, and/or shared author profiles. In this work, we propose a novel procedure for scientific publication hierarchical categorization based on the repetition or absence of relevant descriptors in association rules among publications. The key idea is that publication categories can be automatically defined by strong associations of nuclear topics. Also, some very specific subcategories can be defined by exclusion from any set of rules. This process can be used to construct a data-driven hierarchy of scientific publication categories from scratch or to improve any existing categorization by discovering new fields. In this paper the proposed algorithm uses SJR descriptors all journals in the SCImago dataset and the three-level classification in the Scopus dataset (covering only 35 % of publications of the SCImago dataset) to discover new categories and assign every journal to the resulting enhanced hierarchy one. We have focused on the field of “Physical Sciences and Engineering”, using the SCImago and Scopus datasets from 2019 (30,883 scientific publications). Our procedure combines data engineering techniques with association rules and generates as a result potential new categories and outlier subcategories. To evaluate the suitability of our proposal, we have analyzed classification results based on the original category list and our extended two-level categorization via the Jensen–Shannon divergence and supervised machine-learning techniques. Results reveal the consistency and suitability of our categorization procedure.
Keywords: Scientific publication subject categorization; Journal studies; Association rules (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S1751157723000913
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:infome:v:18:y:2024:i:1:s1751157723000913
DOI: 10.1016/j.joi.2023.101466
Access Statistics for this article
Journal of Informetrics is currently edited by Leo Egghe
More articles in Journal of Informetrics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().