SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition
Xiongjiang Xiao,
Ziliang Ren (),
Huan Li,
Wenhong Wei,
Zhiyong Yang and
Huaide Yang
Additional contact information
Xiongjiang Xiao: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Ziliang Ren: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Huan Li: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Wenhong Wei: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Zhiyong Yang: School of Artificial Intelligence, Yantai Institute of Technology, Yantai 264003, China
Huaide Yang: School of Electronic Information, Dongguan Polytechnic, Dongguan 523109, China
Mathematics, 2023, vol. 11, issue 9, 1-19
Abstract:
RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs.
Keywords: action recognition; multimodality compensation; SlowFast pathways; swin transformer; dual-stream (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/9/2115/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/9/2115/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:9:p:2115-:d:1136618
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().