Automatic Classification of National Health Service Feedback

Haynes, Christopher; Palomino, Marco A.; Stuart, Liz; Viira, David; Hannon, Frances; Crossingham, Gemma; Tantam, Kate

Automatic Classification of National Health Service Feedback

Christopher Haynes, Marco A. Palomino, Liz Stuart, David Viira, Frances Hannon, Gemma Crossingham and Kate Tantam
Additional contact information
Christopher Haynes: School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth PL4 8AA, UK
Marco A. Palomino: School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth PL4 8AA, UK
Liz Stuart: School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth PL4 8AA, UK
David Viira: Faculty of Health, University Hospitals Plymouth, Derriford Rd., Plymouth PL6 8DH, UK
Frances Hannon: Faculty of Health, University Hospitals Plymouth, Derriford Rd., Plymouth PL6 8DH, UK
Gemma Crossingham: Faculty of Health, University Hospitals Plymouth, Derriford Rd., Plymouth PL6 8DH, UK
Kate Tantam: Faculty of Health, University Hospitals Plymouth, Derriford Rd., Plymouth PL6 8DH, UK

Mathematics, 2022, vol. 10, issue 6, 1-23

Abstract: Text datasets come in an abundance of shapes, sizes and styles. However, determining what factors limit classification accuracy remains a difficult task which is still the subject of intensive research. Using a challenging UK National Health Service (NHS) dataset, which contains many characteristics known to increase the complexity of classification, we propose an innovative classification pipeline. This pipeline switches between different text pre-processing, scoring and classification techniques during execution. Using this flexible pipeline, a high level of accuracy has been achieved in the classification of a range of datasets, attaining a micro-averaged F1 score of 93.30% on the Reuters-21578 “ApteMod” corpus. An evaluation of this flexible pipeline was carried out using a variety of complex datasets compared against an unsupervised clustering approach. The paper describes how classification accuracy is impacted by an unbalanced category distribution, the rare use of generic terms and the subjective nature of manual human classification.

Keywords: NLP; classification; clustering; text pre-processing; machine learning; National Health Service (NHS) (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/6/983/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/6/983/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:6:p:983-:d:774482

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().