Predicting Future Driving Risk of Crash-Involved Drivers Based on a Systematic Machine Learning Framework

Wang, Chen; Liu, Lin; Xu, Chengcheng; Lv, Weitao

Predicting Future Driving Risk of Crash-Involved Drivers Based on a Systematic Machine Learning Framework

Chen Wang, Lin Liu, Chengcheng Xu and Weitao Lv
Additional contact information
Chen Wang: Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing 210096, China
Lin Liu: Jiangsu Intelligent Transportation Systems Co., Ltd., Nanjing 210096, China
Chengcheng Xu: Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing 210096, China
Weitao Lv: Jiangsu Intelligent Transportation Systems Co., Ltd., Nanjing 210096, China

IJERPH, 2019, vol. 16, issue 3, 1-18

Abstract: The objective of this paper is to predict the future driving risk of crash-involved drivers in Kunshan, China. A systematic machine learning framework is proposed to deal with three critical technical issues: 1. defining driving risk; 2. developing risky driving factors; 3. developing a reliable and explicable machine learning model. High-risk (HR) and low-risk (LR) drivers were defined by five different scenarios. A number of features were extracted from seven-year crash/violation records. Drivers’ two-year prior crash/violation information was used to predict their driving risk in the subsequent two years. Using a one-year rolling time window, prediction models were developed for four consecutive time periods: 2013–2014, 2014–2015, 2015–2016, and 2016–2017. Four tree-based ensemble learning techniques were attempted, including random forest (RF), Adaboost with decision tree, gradient boosting decision tree (GBDT), and extreme gradient boosting decision tree (XGboost). A temporal transferability test and a follow-up study were applied to validate the trained models. The best scenario defining driving risk was multi-dimensional, encompassing crash recurrence, severity, and fault commitment. GBDT appeared to be the best model choice across all time periods, with an acceptable average precision (AP) of 0.68 on the most recent datasets (i.e., 2016–2017). Seven of nine top features were related to risky driving behaviors, which presented non-linear relationships with driving risk. Model transferability held within relatively short time intervals (1–2 years). Appropriate risk definition, complicated violation/crash features, and advanced machine learning techniques need to be considered for risk prediction task. The proposed machine learning approach is promising, so that safety interventions can be launched more effectively.

Keywords: driving risk; traffic violation behavior; machine learning; temporal transferability (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/1660-4601/16/3/334/pdf (application/pdf)
https://www.mdpi.com/1660-4601/16/3/334/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:16:y:2019:i:3:p:334-:d:200807

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().