Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
Wenxiu Xie,
Meng Ji,
Riliu Huang,
Tianyong Hao and
Chi-Yin Chow
Additional contact information
Wenxiu Xie: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 518057, China
Meng Ji: School of Languages and Cultures, University of Sydney, Sydney 2006, Australia
Riliu Huang: School of Languages and Cultures, University of Sydney, Sydney 2006, Australia
Tianyong Hao: School of Computer Science, South China Normal University, Guangzhou 510631, China
Chi-Yin Chow: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 518057, China
IJERPH, 2021, vol. 18, issue 16, 1-19
Abstract:
We aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of English source information as input to the MT systems. A MNB classifier was developed to provide intuitive probabilistic predictions of erroneous health translation outputs based on the computational modelling of a small number of optimised features of the original English source texts. The best performing multinominal Naïve Bayes classifier (MNB) using a small number of optimised features (8) achieved statistically higher AUC (M = 0.760, SD = 0.03) than the classifier using high-dimension natural features (135) (M = 0.631, SD = 0.006, p < 0.0001, SE = 0.004) and the automatically optimised classifier (22) (M = 0.7231, SD = 0.0084, p < 0.0001, SE = 0.004). Furthermore, MNB (8) had statistically higher sensitivity (M = 0.885, SD = 0.100) compared with the full-feature classifier (135) (M = 0.577, SD = 0.155, p < 0.0001, SE = 0.005) and the automatically optimised classifier (22) (M = 0.731, SD = 0.139, p < 0.0001, SE = 0.0023). Finally, MNB (8) reached statistically higher specificity (M = 0.667, SD = 0.138) compared to the full-feature classifier (135) (M = 0.567, SD = 0.139, p = 0.0002, SE = 0.026) and the automatically optimised classifier (22) (M = 0.633, SD = 0.141, p = 0.0133, SE = 0.026).
Keywords: multinominal naïve bayes classifier; public health education and promotion; machine learning; digital vulnerability (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1660-4601/18/16/8789/pdf (application/pdf)
https://www.mdpi.com/1660-4601/18/16/8789/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:18:y:2021:i:16:p:8789-:d:618157
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().