A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis
Saeed-Ul Hassan (),
Iqra Safder,
Anam Akram and
Faisal Kamiran
Additional contact information
Saeed-Ul Hassan: Information Technology University
Iqra Safder: Information Technology University
Anam Akram: Information Technology University
Faisal Kamiran: Information Technology University
Scientometrics, 2018, vol. 116, issue 2, No 15, 973-996
Abstract:
Abstract We measure the knowledge flows between countries by analysing publication and citation data, arguing that not all citations are equally important. Therefore, in contrast to existing techniques that utilize absolute citation counts to quantify knowledge flows between different entities, our model employs a citation context analysis technique, using a machine-learning approach to distinguish between important and non-important citations. We use 14 novel features (including context-based, cue words-based and text-based) to train a Support Vector Machine (SVM) and Random Forest classifier on an annotated dataset of 20,527 publications downloaded from the Association for Computational Linguistics anthology ( http://allenai.org/data.html ). Our machine-learning models outperform existing state-of-the-art citation context approaches, with the SVM model reaching up to 61% and the Random Forest model up to a very encouraging 90% Precision–Recall Area Under the Curve, with 10-fold cross-validation. Finally, we present a case study to explain our deployed method for datasets of PLoS ONE full-text publications in the field of Computer and Information Sciences. Our results show that a significant volume of knowledge flows from the United States, based on important citations, are consumed by the international scientific community. Of the total knowledge flow from China, we find a relatively smaller proportion (only 4.11%) falling into the category of knowledge flow based on important citations, while The Netherlands and Germany show the highest proportions of knowledge flows based on important citations, at 9.06 and 7.35% respectively. Among the institutions, interestingly, the findings show that at the University of Malaya more than 10% of the knowledge produced falls into the category of important. We believe that such analyses are helpful to understand the dynamics of the relevant knowledge flows across nations and institutions.
Keywords: Knowledge flows; Machine learning; Citation context classification; Influential citations; Citation analysis (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (24)
Downloads: (external link)
http://link.springer.com/10.1007/s11192-018-2767-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2767-x
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-018-2767-x
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().