EconPapers    
Economics at your fingertips  
 

Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use

Madhar M. Taamneh (), Salah Taamneh, Ahmad H. Alomari and Musab Abuaddous
Additional contact information
Madhar M. Taamneh: Department of Civil Engineering, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan
Salah Taamneh: Department of Computer Science and Applications, Faculty of Prince Al-Hussien Bin Abdullah for IT, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan
Ahmad H. Alomari: Department of Civil Engineering, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan
Musab Abuaddous: Department of Civil Engineering, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan

Sustainability, 2023, vol. 15, issue 13, 1-20

Abstract: Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to identify the main factors behind this behavior. Machine learning (ML) techniques are among the most effective techniques for this purpose. However, the potential and usefulness of these techniques are limited, due to the imbalance of available data. The majority class of instances collected is for drivers who do not use their phones, while the minority class is for those who do use their phones. This paper evaluates two main approaches for handling imbalanced datasets on driver phone use. These methods include oversampling and undersampling. The effectiveness of each method was evaluated using six ML techniques: Multilayer Perceptron (MLP), Support Vector Machine (SVM), Naive Bayes (NB), Bayesian Network (BayesNet), J48, and ID3. The proposed methods were also evaluated on three Deep Learning (DL) models: Arch1 (5 hidden layers), Arch2 (10 hidden layers), and Arch3 (15 hidden layers). The data used in this document were collected through a direct observation study to explore a set of human, vehicle, and road surface characteristics. The results showed that all ML methods, as well as DL methods, achieved balanced accuracy values for both classes. ID3, J48, and MLP methods outperformed the rest of the ML methods in all scenarios, with ID3 achieving slightly better accuracy. The DL methods also provided good performances, especially for the undersampling data. The results also showed that the classification methods performed best on the undersampled data. It was concluded that road classification has the highest impact on cell phone use, followed by driver age group, driver gender, vehicle type, and, finally, driver seatbelt usage.

Keywords: imbalanced data; Multilayer Perceptron; Naive Bayes; decision tree; driver phone use; traffic safety (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2071-1050/15/13/10668/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/13/10668/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:13:p:10668-:d:1188179

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jsusta:v:15:y:2023:i:13:p:10668-:d:1188179