Prediction of Number of Cases of 2019 Novel Coronavirus (COVID-19) Using Social Media Search Index
Lei Qin,
Qiang Sun,
Yidan Wang,
Ke-Fei Wu,
Mingchih Chen,
Ben-Chang Shia and
Szu-Yuan Wu
Additional contact information
Lei Qin: School of Statistics, University of International Business and Economics, Beijing 100029, China
Qiang Sun: School of Statistics, University of International Business and Economics, Beijing 100029, China
Yidan Wang: School of Statistics, University of International Business and Economics, Beijing 100029, China
Ke-Fei Wu: Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 242, Taiwan
Mingchih Chen: Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 242, Taiwan
Ben-Chang Shia: Research Center of Big Data, College of management, Taipei Medical University, Taipei 110, Taiwan
Szu-Yuan Wu: Department of Food Nutrition and Health Biotechnology, College of Medical and Health Science, Asia University, Taichung 41354, Taiwan
IJERPH, 2020, vol. 17, issue 7, 1-14
Abstract:
Predicting the number of new suspected or confirmed cases of novel coronavirus disease 2019 (COVID-19) is crucial in the prevention and control of the COVID-19 outbreak. Social media search indexes (SMSI) for dry cough, fever, chest distress, coronavirus, and pneumonia were collected from 31 December 2019 to 9 February 2020. The new suspected cases of COVID-19 data were collected from 20 January 2020 to 9 February 2020. We used the lagged series of SMSI to predict new suspected COVID-19 case numbers during this period. To avoid overfitting, five methods, namely subset selection, forward selection, lasso regression, ridge regression, and elastic net, were used to estimate coefficients. We selected the optimal method to predict new suspected COVID-19 case numbers from 20 January 2020 to 9 February 2020. We further validated the optimal method for new confirmed cases of COVID-19 from 31 December 2019 to 17 February 2020. The new suspected COVID-19 case numbers correlated significantly with the lagged series of SMSI. SMSI could be detected 6–9 days earlier than new suspected cases of COVID-19. The optimal method was the subset selection method, which had the lowest estimation error and a moderate number of predictors. The subset selection method also significantly correlated with the new confirmed COVID-19 cases after validation. SMSI findings on lag day 10 were significantly correlated with new confirmed COVID-19 cases. SMSI could be a significant predictor of the number of COVID-19 infections. SMSI could be an effective early predictor, which would enable governments’ health departments to locate potential and high-risk outbreak areas.
Keywords: social media; COVID-19; predictor; outbreak; new case (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations: View citations in EconPapers (9)
Downloads: (external link)
https://www.mdpi.com/1660-4601/17/7/2365/pdf (application/pdf)
https://www.mdpi.com/1660-4601/17/7/2365/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:17:y:2020:i:7:p:2365-:d:339385
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().