Developing a dengue forecast model using machine learning: A case study in China
Pi Guo,
Tao Liu,
Qin Zhang,
Li Wang,
Jianpeng Xiao,
Qingying Zhang,
Ganfeng Luo,
Zhihao Li,
Jianfeng He,
Yonghui Zhang and
Wenjun Ma
PLOS Neglected Tropical Diseases, 2017, vol. 11, issue 10, 1-22
Abstract:
Background: In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Methodology/Principal findings: Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011–2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. Conclusion and significance: The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics. Author summary: Dengue epidemics have posed a great burden expanding of disease, with areas expanding and incidence increasing in China recently. It has remained challenging to develop a robust and accurate forecast model and enhance predictability of dengue incidence. Several state-of-the-art machine learning algorithms, including the support vector regression algorithm, step-down linear regression model, gradient boosted regression tree algorithm, negative binomial regression model, least absolute shrinkage and selection operator linear regression model and generalized additive model, were compared and evaluated to forecast dengue incidence in this study. The SVR model, based on selection by a cross-validation technique, was superior to other models assessed using weekly dengue surveillance data, Baidu search query data and meteorological data during 2011–2014 in Guangdong province. The high accuracy and robustness of the proposed SVR model to predict the occurrence of an outbreak was also validated using data from other provinces, including Yunnan, Guangxi, Hunan, Fujian and Zhejiang, spanning southern China. To the best of our knowledge, this is the first attempt to thoroughly evaluate different algorithms for dengue incidence prediction. Our identification of the optimal model will help to precisely track dengue dynamics in the country.
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0005973 (text/html)
https://journals.plos.org/plosntds/article/file?id ... 05973&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pntd00:0005973
DOI: 10.1371/journal.pntd.0005973
Access Statistics for this article
More articles in PLOS Neglected Tropical Diseases from Public Library of Science
Bibliographic data for series maintained by plosntds ().