InBRwSANet: Self-attention based parallel inverted residual bottleneck architecture for human action recognition in smart cities
Yasir Khan Jadoon,
Muhammad Attique Khan,
Yasir Noman Khalid,
Jamel Baili,
Nebojsa Bacanin,
MinKyung Hong and
Yunyoung Nam
PLOS ONE, 2025, vol. 20, issue 5, 1-22
Abstract:
Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0322555 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 22555&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0322555
DOI: 10.1371/journal.pone.0322555
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().