A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data

Kim, Jeonghun; Kwon, Ohbyung

A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data

Jeonghun Kim and Ohbyung Kwon
Additional contact information
Jeonghun Kim: Department of Management, Kyung Hee University, Seoul 02447, Korea
Ohbyung Kwon: School of Management, Kyung Hee University, Seoul 02447, Korea

Sustainability, 2021, vol. 13, issue 6, 1-18

Abstract: The COVID-19 pandemic is threatening our quality of life and economic sustainability. The rapid spread of COVID-19 around the world requires each country or region to establish appropriate anti-proliferation policies in a timely manner. It is important, in making COVID-19-related health policy decisions, to predict the number of confirmed COVID-19 patients as accurately and quickly as possible. Predictions are already being made using several traditional models such as the susceptible, infected, and recovered (SIR) and susceptible, exposed, infected, and resistant (SEIR) frameworks, but these predictions may not be accurate due to the simplicity of the models, so a prediction model with more diverse input features is needed. However, it is difficult to propose a universal predictive model globally because there are differences in data availability by country and region. Moreover, the training data for predicting confirmed patients is typically an imbalanced dataset consisting mostly of normal data; this imbalance negatively affects the accuracy of prediction. Hence, the purposes of this study are to extract rules for selecting appropriate prediction algorithms and data imbalance resolution methods according to the characteristics of the datasets available for each country or region, and to predict the number of COVID-19 patients based on these algorithms. To this end, a decision tree-type rule was extracted to identify 13 data characteristics and a discrimination algorithm was selected based on those characteristics. With this system, we predicted the COVID-19 situation in four regions: Africa, China, Korea, and the United States. The proposed method has higher prediction accuracy than the random selection method, the ensemble method, or the greedy method of discriminant analysis, and prediction takes very little time.

Keywords: COVID-19 pandemic; classification algorithms; data availability; big data analytics; decision tree; data imbalance (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2071-1050/13/6/3099/pdf (application/pdf)
https://www.mdpi.com/2071-1050/13/6/3099/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:13:y:2021:i:6:p:3099-:d:515306

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().