Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations
Sheng Dong,
Afaq Khattak,
Irfan Ullah,
Jibiao Zhou and
Arshad Hussain
Additional contact information
Sheng Dong: School of Civil and Transportation Engineering, Ningbo University of Technology, Fenghua Road No. 201, Ningbo 315211, China
Afaq Khattak: The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, 4800 Cao’an Road, Jiading, Shanghai 201804, China
Irfan Ullah: Department of Civil Engineering, International Islamic University, Sector H-10, Islamabad 1243, Pakistan
Jibiao Zhou: College of Transportation Engineering, Tongji University, 4800 Cao’an Road, Jiading, Shanghai 201804, China
Arshad Hussain: NUST Institute of Civil Engineering, National University of Sciences and Technology, Sector H-12, Islamabad 44000, Pakistan
IJERPH, 2022, vol. 19, issue 5, 1-23
Abstract:
Road traffic accidents are one of the world’s most serious problems, as they result in numerous fatalities and injuries, as well as economic losses each year. Assessing the factors that contribute to the severity of road traffic injuries has proven to be insightful. The findings may contribute to a better understanding of and potential mitigation of the risk of serious injuries associated with crashes. While ensemble learning approaches are capable of establishing complex and non-linear relationships between input risk variables and outcomes for the purpose of injury severity prediction and classification, most of them share a critical limitation: their “black-box” nature. To develop interpretable predictive models for road traffic injury severity, this paper proposes four boosting-based ensemble learning models, namely a novel Natural Gradient Boosting, Adaptive Gradient Boosting, Categorical Gradient Boosting, and Light Gradient Boosting Machine, and uses a recently developed SHapley Additive exPlanations analysis to rank the risk variables and explain the optimal model. Among four models, LightGBM achieved the highest classification accuracy (73.63%), precision (72.61%), and recall (70.09%), F1-scores (70.81%), and AUC (0.71) when tested on 2015–2019 Pakistan’s National Highway N-5 (Peshawar to Rahim Yar Khan Section) accident data. By incorporating the SHapley Additive exPlanations approach, we were able to interpret the model’s estimation results from both global and local perspectives. Following interpretation, it was determined that the Month_of_Year, Cause_of_Accident, Driver_Age and Collision_Type all played a significant role in the estimation process. According to the analysis, young drivers and pedestrians struck by a trailer have a higher risk of suffering fatal injuries. The combination of trailers and passenger vehicles, as well as driver at-fault, hitting pedestrians and rear-end collisions, significantly increases the risk of fatal injuries. This study suggests that combining LightGBM and SHAP has the potential to develop an interpretable model for predicting road traffic injury severity.
Keywords: traffic safety; road traffic injuries; boosting-based ensemble models; SHapley Additive exPlanations (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
https://www.mdpi.com/1660-4601/19/5/2925/pdf (application/pdf)
https://www.mdpi.com/1660-4601/19/5/2925/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:19:y:2022:i:5:p:2925-:d:762712
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().