Deep learning in automated text classification: a case study using toxicological abstracts
Arun Varghese (),
George Agyeman-Badu and
Michelle Cawley
Additional contact information
Arun Varghese: ICF
George Agyeman-Badu: ICF
Michelle Cawley: University of North Carolina
Environment Systems and Decisions, 2020, vol. 40, issue 4, 465-479
Abstract:
Abstract Machine learning technology has been widely adopted as a cost-saving document prioritization approach in systematic literature reviews related to human health risk assessments. Supervised approaches use a training dataset, a relatively small set of documents with human-annotated labels indicating the topic of each document, to build models that automatically predict the labels of a much larger set of unlabelled documents. Deep learning algorithms form a branch of machine learning that relies on complex neural network architectures to learn the features of the object to be classified. Although deep learning algorithms have till recently mainly been applied for image, video, and audio classification, they are increasingly being deployed on text classification problems. To explore the potential advantages and practicalities of using deep learning algorithms in the document prioritization step of systematic literature reviews, we compare the performance of the most commonly used deep learning architectures with more traditional machine learning models using a dataset of approximately 7000 abstracts from the scientific literature related to the chemical arsenic. The dataset was previously annotated by subject matter experts with regard to relevance to toxicological mode of action. We examine the relative performance of each algorithm type at alternative levels of training by sequentially expanding the training dataset to generate a learning curve. We find that deep learning offers increased performance in some instances but also requires more data to train algorithms, increased model training time, increased computational power, and more labor-intensive algorithm tuning compared to baseline traditional machine learning algorithms.
Keywords: Literature review; Systematic review; Automated document classification; Machine learning; Deep learning; Natural language processing (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s10669-020-09763-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:envsyd:v:40:y:2020:i:4:d:10.1007_s10669-020-09763-2
Ordering information: This journal article can be ordered from
https://www.springer.com/journal/10669
DOI: 10.1007/s10669-020-09763-2
Access Statistics for this article
More articles in Environment Systems and Decisions from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().