Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models

Shah, Denis A; De Wolf, Erick D; Paul, Pierce A; Madden, Laurence V

Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models

Denis A Shah, Erick D De Wolf, Pierce A Paul and Laurence V Madden

PLOS Computational Biology, 2021, vol. 17, issue 3, 1-23

Abstract: Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.Author summary: Ensembling takes a set of predictions from individual models and combines them such that the performance of the ensemble is ideally better than that of any one of the constituent models. Ensembling requires diversity among the individual models in terms of their predictions. However, models developed within the same research group may in fact be interrelated, and high levels of correlation among their predictions could theoretically negate any ensembling benefit. We examined, using a case study on predicting epidemics of Fusarium head blight of wheat, whether ensembling could still be beneficial when the individual models were simple but highly correlated. Even in this situation ensembling led to improvements in prediction without a high computational cost and was therefore profitable even when the diversity in model predictions was low.

Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008831 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 08831&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1008831

DOI: 10.1371/journal.pcbi.1008831

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().