Utility of machine learning in developing a predictive model for early-age-onset colorectal neoplasia using electronic health records
Hisham Hussan,
Jing Zhao,
Abraham K Badu-Tawiah,
Peter Stanich,
Fred Tabung,
Darrell Gray,
Qin Ma,
Matthew Kalady and
Steven K Clinton
PLOS ONE, 2022, vol. 17, issue 3, 1-18
Abstract:
Background and aims: The incidence of colorectal cancer (CRC) is increasing in adults younger than 50, and early screening remains challenging due to cost and under-utilization. To identify individuals aged 35–50 years who may benefit from early screening, we developed a prediction model using machine learning and electronic health record (EHR)-derived factors. Methods: We enrolled 3,116 adults aged 35–50 at average-risk for CRC and underwent colonoscopy between 2017–2020 at a single center. Prediction outcomes were (1) CRC and (2) CRC or high-risk polyps. We derived our predictors from EHRs (e.g., demographics, obesity, laboratory values, medications, and zip code-derived factors). We constructed four machine learning-based models using a training set (random sample of 70% of participants): regularized discriminant analysis, random forest, neural network, and gradient boosting decision tree. In the testing set (remaining 30% of participants), we measured predictive performance by comparing C-statistics to a reference model (logistic regression). Results: The study sample was 55.1% female, 32.8% non-white, and included 16 (0.05%) CRC cases and 478 (15.3%) cases of CRC or high-risk polyps. All machine learning models predicted CRC with higher discriminative ability compared to the reference model [e.g., C-statistics (95%CI); neural network: 0.75 (0.48–1.00) vs. reference: 0.43 (0.18–0.67); P = 0.07] Furthermore, all machine learning approaches, except for gradient boosting, predicted CRC or high-risk polyps significantly better than the reference model [e.g., C-statistics (95%CI); regularized discriminant analysis: 0.64 (0.59–0.69) vs. reference: 0.55 (0.50–0.59); P
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0265209 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 65209&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0265209
DOI: 10.1371/journal.pone.0265209
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().