The ensemble learning process for short-term prediction of traffic state on rural roads
Arash Rasaizadi,
Fateme Hafizi and
Seyedehsan Seyedabrishami
Chapter 4 in Handbook on Artificial Intelligence and Transport, 2023, pp 102-123 from Edward Elgar Publishing
Abstract:
Rural road traffic data including, speed, volume, density, travel time, and traffic state are considered “Big Data” due to high volume, speed of data generation, and varieties including videos, texts, and quantitative data. The analysis and prediction of this data for the short-term future provide real-time information for road operators and passengers to better plan their trips. Among traffic parameters, speed, volume, density, and travel time are quantitative, and traffic state is qualitative—usually classified into light, semi-heavy, and heavy states. In this chapter, traffic data from rural roads covering a period of five years is applied to the calibration of statistical time series and machine learning algorithms to explore influential factors on a real-time traffic state. First, the traffic database is shown with calendar data including season, month, week, day, hour, holiday, sequence of holidays, and weather data obtained from meteorological stations. Among the aforementioned variables, several variables are defined in cyclical form, and others are converted to dummy variables. In the second step, time series regression, long short-term memory (LSTM), random forest (RF), support vector machine (SVM), and K-nearest neighbours (KNN) are trained by using the first three years of data as a training dataset, and their performance evaluated using the remaining two years as a test dataset. In terms of accuracy, the RF model showed superiority over other models (RF accuracy, 76.9%). In terms of balanced accuracy, SVM was more accurate than RF to predict light traffic state (SVM accuracy, 78.7%). The maximum balanced accuracy for predicting semi-heavy and heavy states was achieved by RF (69.9% and 53.8%, respectively). There was no single model with the highest accuracy for each month of a year. These two reasons were motivations to use the ordered logit (OL) model in the ensemble learning process. The input of this process is the output of the base models. Combining the outputs of single models provides a single output that is expected to be more accurate than the base models. After calibrating the OL model using the predictions from the base models for the first year of the test data set, all models were evaluated on data from the second year. The highest accuracy was obtained using the OL model in the ensemble learning process (an accuracy of 82.2%).
Keywords: Economics and Finance; Environment; Geography; Innovations and Technology; Law - Academic; Politics and Public Policy Urban and Regional Studies (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.elgaronline.com/doi/10.4337/9781803929545.00010 (application/pdf)
Our link check indicates that this URL is bad, the error code is: 503 Service Temporarily Unavailable
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:elg:eechap:21868_4
Ordering information: This item can be ordered from
http://www.e-elgar.com
Access Statistics for this chapter
More chapters in Chapters from Edward Elgar Publishing
Bibliographic data for series maintained by Darrel McCalla ().