Evaluating human versus machine learning performance in classifying research abstracts

Goh, Yeow Chong; Cai, Xin Qing; Theseira, Walter; Ko, Giovanni; Khor, Khiam Aik

Evaluating human versus machine learning performance in classifying research abstracts

Yeow Chong Goh, Xin Qing Cai, Walter Theseira, Giovanni Ko and Khiam Aik Khor ()
Additional contact information
Yeow Chong Goh: Nanyang Technological University
Xin Qing Cai: Nanyang Technological University
Walter Theseira: Singapore University of Social Sciences
Khiam Aik Khor: Nanyang Technological University

Scientometrics, 2020, vol. 125, issue 2, No 22, 1197-1212

Abstract: Abstract We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

Keywords: Discipline classification; Text classification; Supervised classification (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-020-03614-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:125:y:2020:i:2:d:10.1007_s11192-020-03614-2

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-020-03614-2

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().