Impact of Data Loss on Multi-Step Forecast of Traffic Flow in Urban Roads Using K-Nearest Neighbors

Mallek, Amin; Klosa, Daniel; Büskens, Christof

Impact of Data Loss on Multi-Step Forecast of Traffic Flow in Urban Roads Using K-Nearest Neighbors

Amin Mallek (), Daniel Klosa and Christof Büskens
Additional contact information
Amin Mallek: WG Optimisation and Optimal Control, Center for Industrial Mathematics, University of Bremen, 28359 Bremen, Germany
Daniel Klosa: WG Optimisation and Optimal Control, Center for Industrial Mathematics, University of Bremen, 28359 Bremen, Germany
Christof Büskens: WG Optimisation and Optimal Control, Center for Industrial Mathematics, University of Bremen, 28359 Bremen, Germany

Sustainability, 2022, vol. 14, issue 18, 1-18

Abstract: Data-driven models have recently proved to be a very powerful tool to extract relevant information from different kinds of datasets. However, datasets are often subject to multiple anomalies, including the loss of important parts of entries. In the context of intelligent transportation, we examine in this paper the impact of data loss on the behavior of one of the frequently used approaches to address this kind of problems in the literature, namely, the k-nearest neighbors model. The method designed herein is set to perform multi-step traffic flow forecasts in urban roads. In our study, we deploy non-prepossessed real data recorded by seven inductive loop detectors and delivered by the Traffic Management Center (VMZ) of Bremen (Germany). Firstly, we measure the performance of the model on a complete dataset of 11 weeks. The same dataset is then used to artificially create 50 incomplete datasets with different gap sizes and completeness levels. Afterwards, in order to reconstruct these datasets, we propose three computationally-low techniques, which proved through empirical testing to be efficient in reproducing missing entries. Thereafter, the performance of the E-KNN model is assessed under the original dataset, incomplete and filled-in datasets. Although the accuracy of E-KNN under incomplete and reconstructed datasets depends on gap lengths and completeness levels, under original dataset, the model proves to deliver six-step forecasts with an accuracy of 83 % on average over 3 weeks of the test set, which also translates to a less than one car per minute error.

Keywords: data loss; incomplete dataset; intelligent transportation; k-nearest neighbors; linear regression; short-term forecast; traffic flow (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2071-1050/14/18/11232/pdf (application/pdf)
https://www.mdpi.com/2071-1050/14/18/11232/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:14:y:2022:i:18:p:11232-:d:909404

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().