Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion

Zohaib, Muhammad; Asim, Muhammad; ELAffendi, Mohammed

Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion

Muhammad Zohaib, Muhammad Asim () and Mohammed ELAffendi
Additional contact information
Muhammad Zohaib: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Muhammad Asim: EIAS Data Science and Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
Mohammed ELAffendi: EIAS Data Science and Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia

Mathematics, 2024, vol. 12, issue 10, 1-23

Abstract: Emergency vehicle detection plays a critical role in ensuring timely responses and reducing accidents in modern urban environments. However, traditional methods that rely solely on visual cues face challenges, particularly in adverse conditions. The objective of this research is to enhance emergency vehicle detection by leveraging the synergies between acoustic and visual information. By incorporating advanced deep learning techniques for both acoustic and visual data, our aim is to significantly improve the accuracy and response times. To achieve this goal, we developed an attention-based temporal spectrum network (ATSN) with an attention mechanism specifically designed for ambulance siren sound detection. In parallel, we enhanced visual detection tasks by implementing a Multi-Level Spatial Fusion YOLO (MLSF-YOLO) architecture. To combine the acoustic and visual information effectively, we employed a stacking ensemble learning technique, creating a robust framework for emergency vehicle detection. This approach capitalizes on the strengths of both modalities, allowing for a comprehensive analysis that surpasses existing methods. Through our research, we achieved remarkable results, including a misdetection rate of only 3.81% and an accuracy of 96.19% when applied to visual data containing emergency vehicles. These findings represent significant progress in real-world applications, demonstrating the effectiveness of our approach in improving emergency vehicle detection systems.

Keywords: deep learning; image classification; multimodal; attention-based temporal spectrum; ensemble learning; road safety (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/10/1514/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/10/1514/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:10:p:1514-:d:1393527

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().