Temporal–Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning
Kaiwen Zhang,
Kunchen Zhao and
Yunong Tian ()
Additional contact information
Kaiwen Zhang: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Kunchen Zhao: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Yunong Tian: CAS Engineering Laboratory for Intelligent Industrial Vision Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Mathematics, 2024, vol. 12, issue 14, 1-16
Abstract:
Zero-shot learning (ZSL) enables models to recognize categories not encountered during training, which is crucial for categories with limited data. Existing methods overlook efficient temporal modeling in multimodal data. This paper proposes a Temporal–Semantic Aligning and Reasoning Transformer (TSART) for spatio-temporal modeling. TSART uses the pre-trained SeLaVi network to extract audio and visual features and explores the semantic information of these modalities through audio and visual encoders. It incorporates a temporal information reasoning module to enhance the capture of temporal features in audio, and a cross-modal reasoning module to effectively integrate audio and visual information, establishing a robust joint embedding representation. Our experimental results validate the effectiveness of this approach, demonstrating outstanding Generalized Zero-Shot Learning (GZSL) performance on the UCF101 Generalized Zero-Shot Learning (UCF-GZSL), VGGSound-GZSL, and ActivityNet-GZSL datasets, with notable improvements in the Harmonic Mean (HM) evaluation. These results indicate that TSART has great potential in handling complex spatio-temporal information and multimodal fusion.
Keywords: audio-visual zero-shot learning; transformer (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/14/2200/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/14/2200/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:14:p:2200-:d:1434495
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().