EconPapers    
Economics at your fingertips  
 

Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

Sepideh Fahimifar (), Khadijeh Mousavi (), Fatemeh Mozaffari () and Marcel Ausloos
Additional contact information
Sepideh Fahimifar: University of Tehran
Khadijeh Mousavi: University of Tehran
Fatemeh Mozaffari: University of Tehran

Quality & Quantity: International Journal of Methodology, 2023, vol. 57, issue 4, No 35, 3685-3712

Abstract: Abstract Highly cited papers are influenced by external factors that are not directly related to the document's intrinsic quality. In this study, 50 characteristics for measuring the performance of 68 highly cited papers, from the Journal of The American Medical Informatics Association indexed in Web of Science (WOS), from 2009 to 2019 were investigated. In the first step, a Pearson correlation analysis is performed to eliminate variables with zero or weak correlation with the target (“dependent”) variable (number of citations in WOS). Consequently, 32 variables are selected for the next step. By applying the Ridge technique, 13 features show a positive effect on the number of citations. Using three different algorithms, i.e., Ridge, Lasso, and Boruta, 6 factors appear to be the most relevant ones. The "Number of citations by international researchers", "Journal self-citations in citing documents”, and "Authors' self-citations in citing documents”, are recognized as the most important features by all three methods here used. The "First author's scientific age”, "Open-access paper”, and "Number of first author's citations in WOS" are identified as the important features of highly cited papers by only two methods, Ridge and Lasso. Notice that we use specific machine learning algorithms as feature selection methods (Ridge, Lasso, and Boruta) to identify the most important features of highly cited papers, tools that had not previously been used for this purpose. In conclusion, we re-emphasize the performance resulting from such algorithms. Moreover, we do not advise authors to seek to increase the citations of their articles by manipulating the identified performance features. Indeed, ethical rules regarding these characteristics must be strictly obeyed.

Keywords: Highly cited articles; Feature selections; Altmetrics; Ridge; Lasso; Boruta (search for similar items in EconPapers)
JEL-codes: C80 Y80 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11135-022-01480-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:qualqt:v:57:y:2023:i:4:d:10.1007_s11135-022-01480-z

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11135

DOI: 10.1007/s11135-022-01480-z

Access Statistics for this article

Quality & Quantity: International Journal of Methodology is currently edited by Vittorio Capecchi

More articles in Quality & Quantity: International Journal of Methodology from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:qualqt:v:57:y:2023:i:4:d:10.1007_s11135-022-01480-z