Uncertain imputation for time-series forecasting: Application to COVID-19 daily mortality prediction
Rayane Elimam,
Nicolas Sutton-Charani,
Stéphane Perrey and
Jacky Montmain
PLOS Digital Health, 2022, vol. 1, issue 10, 1-18
Abstract:
The object of this study is to put forward uncertainty modeling associated with missing time series data imputation in a predictive context. We propose three imputation methods associated with uncertainty modeling. These methods are evaluated on a COVID-19 dataset out of which some values have been randomly removed. The dataset contains the numbers of daily COVID-19 confirmed diagnoses (“new cases”) and daily deaths (“new deaths”) recorded since the start of the pandemic up to July 2021. The considered task is to predict the number of new deaths 7 days in advance. The more values are missing, the higher the imputation impact is on the predictive performances. The Evidential K-Nearest Neighbors (EKNN) algorithm is used for its ability to take into account labels uncertainty. Experiments are provided to measure the benefits of the label uncertainty models. Results show the positive impact of uncertainty models on imputation performances, especially in a noisy context where the number of missing values is high.Author Summary: The methodological aim of this study was to take advantage of missing data chronology in the imputation process in order to handle missing time series data. The practical goal of COVID application was to study the link between the numbers of chronological COVID confirmed cases and death. To achieve these goals we proposed 3 imputation methods of missing time series data each of them associated with an uncertainty model. For the COVID number of death prediction task, we set up a non-linear regression modeling for the number of COVID deaths prediction from past deaths and confirmed cases data. This led us to extend the Evidential K-Nearest Neighbor method to regression problems and to assess the impact of uncertainty modeling within imputation process in regards to predictive task. Finally, we showed the superiority of the time-EKNN (TEKNN) in terms of predictive performances compared to the Last Observation Carried Forward (LOCF) and Centered Moving Average (CMA) methods. More globally, we showed the interest of modeling the uncertainty in the imputation process to better control the prediction error, especially during relative stable periods.
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000115 (text/html)
https://journals.plos.org/digitalhealth/article/fi ... 00115&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pdig00:0000115
DOI: 10.1371/journal.pdig.0000115
Access Statistics for this article
More articles in PLOS Digital Health from Public Library of Science
Bibliographic data for series maintained by digitalhealth ().