Scene Recognition for Visually-Impaired People’s Navigation Assistance Based on Vision Transformer with Dual Multiscale Attention
Yahia Said (),
Mohamed Atri,
Marwan Ali Albahar,
Ahmed Ben Atitallah and
Yazan Ahmad Alsariera
Additional contact information
Yahia Said: Remote Sensing Unit, College of Engineering, Northern Border University, Arar 91431, Saudi Arabia
Mohamed Atri: College of Computer Sciences, King Khalid University, Abha 62529, Saudi Arabia
Marwan Ali Albahar: School of Computer Science, Umm Al-Qura University, Mecca 24382, Saudi Arabia
Ahmed Ben Atitallah: Department of Electrical Engineering, College of Engineering, Jouf University, Sakaka 72388, Saudi Arabia
Yazan Ahmad Alsariera: College of Science, Northern Border University, Arar 91431, Saudi Arabia
Mathematics, 2023, vol. 11, issue 5, 1-16
Abstract:
Notable progress was achieved by recent technologies. As the main goal of technology is to make daily life easier, we will investigate the development of an intelligent system for the assistance of impaired people in their navigation. For visually impaired people, navigating is a very complex task that requires assistance. To reduce the complexity of this task, it is preferred to provide information that allows the understanding of surrounding spaces. Particularly, recognizing indoor scenes such as a room, supermarket, or office provides a valuable guide to the visually impaired person to understand the surrounding environment. In this paper, we proposed an indoor scene recognition system based on recent deep learning techniques. Vision transformer (ViT) is a recent deep learning technique that has achieved high performance on image classification. So, it was deployed for indoor scene recognition. To achieve better performance and to reduce the computation complexity, we proposed dual multiscale attention to collect features at different scales for better representation. The main idea was to process small patches and large patches separately and a fusion technique was proposed to combine the features. The proposed fusion technique requires linear time for memory and computation compared to existing techniques that require quadratic time. To prove the efficiency of the proposed technique, extensive experiments were performed on a public dataset which is the MIT67 dataset. The achieved results demonstrated the superiority of the proposed technique compared to the state-of-the-art. Further, the proposed indoor scene recognition system is suitable for implementation on mobile devices with fewer parameters and FLOPs.
Keywords: visually impaired; navigation assistance; vision transformer; dual multiscale attention (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/5/1127/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/5/1127/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:5:p:1127-:d:1078787
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().