EconPapers    
Economics at your fingertips  
 

Temporal–Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning

Kaiwen Zhang, Kunchen Zhao and Yunong Tian ()
Additional contact information
Kaiwen Zhang: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Kunchen Zhao: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Yunong Tian: CAS Engineering Laboratory for Intelligent Industrial Vision Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Mathematics, 2024, vol. 12, issue 14, 1-16

Abstract: Zero-shot learning (ZSL) enables models to recognize categories not encountered during training, which is crucial for categories with limited data. Existing methods overlook efficient temporal modeling in multimodal data. This paper proposes a Temporal–Semantic Aligning and Reasoning Transformer (TSART) for spatio-temporal modeling. TSART uses the pre-trained SeLaVi network to extract audio and visual features and explores the semantic information of these modalities through audio and visual encoders. It incorporates a temporal information reasoning module to enhance the capture of temporal features in audio, and a cross-modal reasoning module to effectively integrate audio and visual information, establishing a robust joint embedding representation. Our experimental results validate the effectiveness of this approach, demonstrating outstanding Generalized Zero-Shot Learning (GZSL) performance on the UCF101 Generalized Zero-Shot Learning (UCF-GZSL), VGGSound-GZSL, and ActivityNet-GZSL datasets, with notable improvements in the Harmonic Mean (HM) evaluation. These results indicate that TSART has great potential in handling complex spatio-temporal information and multimodal fusion.

Keywords: audio-visual zero-shot learning; transformer (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/14/2200/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/14/2200/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:14:p:2200-:d:1434495

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:12:y:2024:i:14:p:2200-:d:1434495