DFA-SAT: Dynamic Feature Abstraction with Self-Attention-Based 3D Object Detection for Autonomous Driving

Mushtaq, Husnain; Deng, Xiaoheng; Ali, Mubashir; Hayat, Babur; Sherazi, Hafiz Husnain Raza

DFA-SAT: Dynamic Feature Abstraction with Self-Attention-Based 3D Object Detection for Autonomous Driving

Husnain Mushtaq, Xiaoheng Deng (), Mubashir Ali, Babur Hayat and Hafiz Husnain Raza Sherazi
Additional contact information
Husnain Mushtaq: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Xiaoheng Deng: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Mubashir Ali: School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
Babur Hayat: Department of Computer Science, University of Chenab, Gujrat 50700, Pakistan
Hafiz Husnain Raza Sherazi: School of Computing and Engineering, University of West London, London W5 5RF, UK

Sustainability, 2023, vol. 15, issue 18, 1-21

Abstract: Autonomous vehicles (AVs) play a crucial role in enhancing urban mobility within the context of a smarter and more connected urban environment. Three-dimensional object detection in AVs is an essential task for comprehending the driving environment to contribute to their safe use in urban environments. Existing 3D LiDAR object detection systems lose many critical point features during the down-sampling process and neglect the crucial interactions between local features, providing insufficient semantic information and leading to subpar detection performance. We propose a dynamic feature abstraction with self-attention (DFA-SAT), which utilizes self-attention to learn semantic features with contextual information by incorporating neighboring data and focusing on vital geometric details. DFA-SAT comprises four modules: object-based down-sampling (OBDS), semantic and contextual feature extraction (SCFE), multi-level feature re-weighting (MLFR), and local and global features aggregation (LGFA). The OBDS module preserves the maximum number of semantic foreground points along with their spatial information. SCFE learns rich semantic and contextual information with respect to spatial dependencies, refining the point features. MLFR decodes all the point features using a channel-wise multi-layered transformer approach. LGFA combines local features with decoding weights for global features using matrix product keys and query embeddings to learn spatial information across each channel. Extensive experiments using the KITTI dataset demonstrate significant improvements over the mainstream methods SECOND and PointPillars, improving the mean average precision (AP) by 6.86% and 6.43%, respectively, on the KITTI test dataset. DFA-SAT yields better and more stable performance for medium and long distances with a limited impact on real-time performance and model parameters, ensuring a transformative shift akin to when automobiles replaced conventional transportation in cities.

Keywords: smart cities; 3D object dejection; semantic features leaning; self-attention (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2071-1050/15/18/13667/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/18/13667/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:18:p:13667-:d:1238815

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().