Machine Learning at the Service of Survival Analysis: Predictions Using Time-to-Event Decomposition and Classification Applied to a Decrease of Blood Antibodies against COVID-19
Lubomír Štěpánek (),
Filip Habarta,
Ivana Malá,
Ladislav Štěpánek,
Marie Nakládalová,
Alena Boriková and
Luboš Marek
Additional contact information
Lubomír Štěpánek: Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic
Filip Habarta: Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic
Ivana Malá: Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic
Ladislav Štěpánek: Department of Occupational Medicine, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University Olomouc, I. P. Pavlova 185/6, 779 00 Olomouc, Czech Republic
Marie Nakládalová: Department of Occupational Medicine, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University Olomouc, I. P. Pavlova 185/6, 779 00 Olomouc, Czech Republic
Alena Boriková: Department of Occupational Medicine, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University Olomouc, I. P. Pavlova 185/6, 779 00 Olomouc, Czech Republic
Luboš Marek: Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic
Mathematics, 2023, vol. 11, issue 4, 1-27
Abstract:
The Cox proportional hazard model may predict whether an individual belonging to a given group would likely register an event of interest at a given time. However, the Cox model is limited by relatively strict statistical assumptions. In this study, we propose decomposing the time-to-event variable into “time” and “event” components and using the latter as a target variable for various machine-learning classification algorithms, which are almost assumption-free, unlike the Cox model. While the time component is continuous and is used as one of the covariates, i.e., input variables for various classification algorithms such as logistic regression, naïve Bayes classifiers, decision trees, random forests, and artificial neural networks, the event component is binary and thus may be modeled using these classification algorithms. Moreover, we apply the proposed method to predict a decrease or non-decrease of IgG and IgM blood antibodies against COVID-19 (SARS-CoV-2), respectively, below a laboratory cut-off, for a given individual at a given time point. Using train-test splitting of the COVID-19 dataset ( n = 663 individuals), models for the mentioned algorithms, including the Cox proportional hazard model, are learned and built on the train subsets while tested on the test ones. To increase robustness of the model performance evaluation, models’ predictive accuracies are estimated using 10-fold cross-validation on the split dataset. Even though the time-to-event variable decomposition might ignore the effect of individual data censoring, many algorithms show similar or even higher predictive accuracy compared to the traditional Cox proportional hazard model. In COVID-19 IgG decrease prediction, multivariate logistic regression (of accuracy 0.811 ), support vector machines (of accuracy 0.845 ), random forests (of accuracy 0.836 ), artificial neural networks (of accuracy 0.806 ) outperform the Cox proportional hazard model (of accuracy 0.796 ), while in COVID-19 IgM antibody decrease prediction, neither Cox regression nor other algorithms perform well (best accuracy is 0.627 for Cox regression). An accurate prediction of mainly COVID-19 IgG antibody decrease can help the healthcare system manage, with no need for extensive blood testing, to identify individuals, for instance, who could postpone boosting vaccination if new COVID-19 variant incomes or should be flagged as high risk due to low COVID-19 antibodies.
Keywords: time-to-event variable decomposition; time-to-event variable prediction; machine-learning classification algorithms; COVID-19; antibody blood level decrease; multivariate logistic regression; naïve Bayes classifier; support vector machines; decision trees and random forests; artificial neural networks (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/4/819/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/4/819/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:4:p:819-:d:1059153
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().