Two-part predictive modeling for COVID-19 cases and deaths in the U.S
Teresa-Thuong Le and
Xiyue Liao
PLOS ONE, 2024, vol. 19, issue 6, 1-16
Abstract:
COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19’s occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0302324 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 02324&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0302324
DOI: 10.1371/journal.pone.0302324
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().