Predicting Survival in Veterans with Follicular Lymphoma Using Structured Electronic Health Record Information and Machine Learning
Chunyang Li,
Vikas Patil,
Kelli M. Rasmussen,
Christina Yong,
Hsu-Chih Chien,
Debbie Morreall,
Jeffrey Humpherys,
Brian C. Sauer,
Zachary Burningham and
Ahmad S. Halwani
Additional contact information
Chunyang Li: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Vikas Patil: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Kelli M. Rasmussen: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Christina Yong: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Hsu-Chih Chien: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Debbie Morreall: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Jeffrey Humpherys: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Brian C. Sauer: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Zachary Burningham: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
Ahmad S. Halwani: Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
IJERPH, 2021, vol. 18, issue 5, 1-19
Abstract:
The most accurate prognostic approach for follicular lymphoma (FL), progression of disease at 24 months (POD24), requires two years’ observation after initiating first-line therapy (L1) to predict outcomes. We applied machine learning to structured electronic health record (EHR) data to predict individual survival at L1 initiation. We grouped 523 observations and 1933 variables from a nationwide cohort of FL patients diagnosed 2006–2014 in the Veterans Health Administration into traditionally used prognostic variables (“curated”), commonly measured labs (“labs”), and International Classification of Diseases diagnostic codes (“ICD”) sets. We compared performance of random survival forests (RSF) vs. traditional Cox model using four datasets: curated, curated + labs, curated + ICD, and curated + ICD + labs, also using Cox on curated + POD24. We evaluated variable importance and partial dependence plots with area under the receiver operating characteristic curve (AUC). RSF with curated + labs performed best, with mean AUC 0.73 (95% CI: 0.71–0.75). It approximated, but did not surpass, Cox with POD24 (mean AUC 0.74 [95% CI: 0.71–0.77]). RSF using EHR data achieved better performance than traditional prognostic variables, setting the foundation for the incorporation of our algorithm into the EHR. It also provides for possible future scenarios in which clinicians could be provided an EHR-based tool which approximates the predictive ability of the most accurate known indicator, using information available 24 months earlier.
Keywords: machine learning; prognosis; follicular lymphoma; survival analysis; random survival forest; predictive analytics; veterans health administration; electronic health records; healthcare; medical and health data (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1660-4601/18/5/2679/pdf (application/pdf)
https://www.mdpi.com/1660-4601/18/5/2679/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:18:y:2021:i:5:p:2679-:d:512214
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().