EconPapers    
Economics at your fingertips  
 

Interactive Learning of a Dual Convolution Neural Network for Multi-Modal Action Recognition

Qingxia Li, Dali Gao, Qieshi Zhang, Wenhong Wei and Ziliang Ren ()
Additional contact information
Qingxia Li: School of Computer and Information, Dongguan City College, Dongguan 523419, China
Dali Gao: School of Mathematics and Computer Science, Quanzhou Normal University, Quanzhou 362000, China
Qieshi Zhang: Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Wenhong Wei: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China
Ziliang Ren: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China

Mathematics, 2022, vol. 10, issue 21, 1-15

Abstract: RGB and depth modalities contain more abundant and interactive information, and convolutional neural networks (ConvNets) based on multi-modal data have achieved successful progress in action recognition. Due to the limitation of a single stream, it is difficult to improve recognition performance by learning multi-modal interactive features. Inspired by the multi-stream learning mechanism and spatial-temporal information representation methods, we construct dynamic images by using the rank pooling method and design an interactive learning dual-ConvNet (ILD-ConvNet) with a multiplexer module to improve action recognition performance. Built on the rank pooling method, the constructed visual dynamic images can capture the spatial-temporal information from entire RGB videos. We extend this method to depth sequences to obtain more abundant multi-modal spatial-temporal information as the inputs of the ConvNets. In addition, we design a dual ILD-ConvNet with multiplexer modules to jointly learn the interactive features of two-stream from RGB and depth modalities. The proposed recognition framework has been tested on two benchmark multi-modal datasets—NTU RGB + D 120 and PKU-MMD. The proposed ILD-ConvNet with a temporal segmentation mechanism achieves an accuracy of 86.9% and 89.4% for Cross-Subject (C-Sub) and Cross-Setup (C-Set) on NTU RGB + D 120, 92.0% and 93.1% for Cross-Subject (C-Sub) and Cross-View (C-View) on PKU-MMD, which are comparable with the state of the art. The experimental results shown that our proposed ILD-ConvNet with a multiplexer module can extract interactive features from different modalities to enhance action recognition performance.

Keywords: convolutional neural network; rank pooling; feature interactive learning; action recognition (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/21/3923/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/21/3923/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:21:p:3923-:d:950357

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:10:y:2022:i:21:p:3923-:d:950357