EconPapers    
Economics at your fingertips  
 

Supervised clustering for automated document classification and prioritization: a case study using toxicological abstracts

Arun Varghese (), Michelle Cawley and Tao Hong
Additional contact information
Arun Varghese: ICF
Michelle Cawley: ICF
Tao Hong: ICF

Environment Systems and Decisions, 2018, vol. 38, issue 3, 398-414

Abstract: Abstract Machine learning and natural language processing algorithms are currently widely used to retrieve relevant documents in a variety of contexts, including literature review and systematic review. Supervised machine learning algorithms perform well in terms of retrieval metrics such as recall and precision, but require the use of a sizeable training dataset, which is typically expensive to develop. Unsupervised machine learning algorithms do not require a training dataset and may perform well in terms of recall, but are typically lower in precision, and do not offer a transparent means for decision-makers to justify selection choices. In this paper, we illustrate the use of a hybrid document classification method based on semi-supervised learning that we refer to as “supervised clustering.” We show that supervised clustering combines the ease of use of unsupervised algorithms with the retrieval efficiency and transparency of supervised algorithms. We demonstrate through simulations the high performance and unbiased predictions of supervised clustering when provided even with only minimal training data. We further propose the use of ensemble learning as a means to maximize retrieval efficiency and to prioritize the review of those documents that are not eliminated by the supervised clustering algorithm.

Keywords: Risk assessment; Literature review; Systematic review; Automated document classification; Machine learning; Natural language processing (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://link.springer.com/10.1007/s10669-017-9670-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:envsyd:v:38:y:2018:i:3:d:10.1007_s10669-017-9670-5

Ordering information: This journal article can be ordered from
https://www.springer.com/journal/10669

DOI: 10.1007/s10669-017-9670-5

Access Statistics for this article

More articles in Environment Systems and Decisions from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:envsyd:v:38:y:2018:i:3:d:10.1007_s10669-017-9670-5