A hybrid model combining depthwise separable convolutions and vision transformers for traffic sign classification under challenging weather conditions
Milind Vijay Parse (),
Dhanya Pramod () and
Deepak Kumar ()
Additional contact information
Milind Vijay Parse: Symbiosis International (Deemed University) (SIU)
Dhanya Pramod: Symbiosis Centre for Information Technology (SCIT), Symbiosis International (Deemed University)
Deepak Kumar: Amity University Uttar Pradesh
International Journal of System Assurance Engineering and Management, 2025, vol. 16, issue 8, No 6, 2720-2742
Abstract:
Abstract This research presents a novel deep-learning framework designed for traffic sign image classification under adverse conditions, including rain, shadows, haze, codec errors, and dirty lenses. To effectively balance accuracy and training parameters, the approach combines depthwise and pointwise convolutions, often referred to as depthwise separable convolutions, with a Vision Transformer (ViT) for subsequent feature extraction. The framework’s initial block comprises two pairs of depthwise and pointwise convolutional layers followed by a normalization layer. Depthwise convolution is responsible for processing each input channel independently and applying separate filters to each channel, thereby reducing computational cost and parameters while maintaining spatial structure. Pointwise convolutional layers combine information from different channels, fostering complex feature interactions and non-linearities. Batch normalization is used for training stability. At the end of the initial block, the max pooling layer is used to enhance and downsample spatial dimensions. The architecture repeats four times, preserving crucial information through skip connections. To extract global context information, inter-block skip connections and global average pooling (GAP) are employed for dimensionality reduction while retaining vital information. Integration of the ViT model in the final layers captures far-reaching dependencies and relations in the feature maps. The framework concludes with two fully connected layers, a bottleneck layer with 1024 neurons and a second layer using softmax activation to generate a probability distribution over 14 classes. The proposed framework, combining convolution blocks and skip connections with precisely tuned ViT hyperparameters, enhances model performance and achieves an exceptional validation accuracy of 99.3%.
Keywords: Challenging weather conditions; Depthwise convolution; Depthwise separable convolutions; Image classification; Pointwise convolution; Vision Transformers (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13198-025-02827-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:ijsaem:v:16:y:2025:i:8:d:10.1007_s13198-025-02827-z
Ordering information: This journal article can be ordered from
http://www.springer.com/engineering/journal/13198
DOI: 10.1007/s13198-025-02827-z
Access Statistics for this article
International Journal of System Assurance Engineering and Management is currently edited by P.K. Kapur, A.K. Verma and U. Kumar
More articles in International Journal of System Assurance Engineering and Management from Springer, The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().