DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval

Cai, Liewu; Zhu, Lei; Zhang, Hongyan; Zhu, Xinghui

DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval

Liewu Cai, Lei Zhu, Hongyan Zhang and Xinghui Zhu
Additional contact information
Liewu Cai: College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
Lei Zhu: College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
Hongyan Zhang: College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
Xinghui Zhu: College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China

Future Internet, 2022, vol. 14, issue 2, 1-23

Abstract: Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing the heterogeneous gap can improve the cross-modal similarity measurement. Meanwhile, modeling cross-modal semantic interaction can capture the semantic correlations more accurately. To this end, this paper presents a novel end-to-end framework, called Dual Attention Generative Adversarial Network (DA-GAN). This technique is an adversarial semantic representation model with a dual attention mechanism, i.e., intra-modal attention and inter-modal attention. Intra-modal attention is used to focus on the important semantic feature within a modality, while inter-modal attention is to explore the semantic interaction between different modalities and then represent the high-level semantic correlation more precisely. A dual adversarial learning strategy is designed to generate modality-invariant representations, which can reduce the cross-modal heterogeneity efficiently. The experiments on three commonly used benchmarks show the better performance of DA-GAN than these competitors.

Keywords: cross-model retrieval; deep representation learning; generative adversarial network; intra-modal attention; inter-modal attention (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/2/43/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/2/43/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:2:p:43-:d:736384

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().