Relational Action Bank with Semantic–Visual Attention for Few-Shot Action Recognition

Liang, Haoming; Du, Jinze; Zhang, Hongchen; Han, Bing; Ma, Yan

Relational Action Bank with Semantic–Visual Attention for Few-Shot Action Recognition

Haoming Liang, Jinze Du (), Hongchen Zhang, Bing Han and Yan Ma
Additional contact information
Haoming Liang: School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
Jinze Du: School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
Hongchen Zhang: School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
Bing Han: School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
Yan Ma: State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing 100191, China

Future Internet, 2023, vol. 15, issue 3, 1-16

Abstract: Recently, few-shot learning has attracted significant attention in the field of video action recognition, owing to its data-efficient learning paradigm. Despite the encouraging progress, identifying ways to further improve the few-shot learning performance by exploring additional or auxiliary information for video action recognition remains an ongoing challenge. To address this problem, in this paper we make the first attempt to propose a relational action bank with semantic–visual attention for few-shot action recognition. Specifically, we introduce a relational action bank as the auxiliary library to assist the network in understanding the actions in novel classes. Meanwhile, the semantic–visual attention is devised to adaptively capture the connections to the foregone actions via both semantic correlation and visual similarity. We extensively evaluate our approach via two backbone models (ResNet-50 and C3D) on HMDB and Kinetics datasets, and demonstrate that the proposed model can obtain significantly better performance compared against state-of-the-art methods. Notably, our results demonstrate an average improvement of about 6.2% when compared to the second-best method on the Kinetics dataset.

Keywords: semantic attention; visual attention; relational action bank; few-shot learning; action recognition (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/15/3/101/pdf (application/pdf)
https://www.mdpi.com/1999-5903/15/3/101/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:15:y:2023:i:3:p:101-:d:1086754

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().