Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm
Guokun Li,
Zhen Wang,
Shibo Xu,
Chuang Feng,
Xiaohan Yang,
Nannan Wu and
Fuzhen Sun
Additional contact information
Guokun Li: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Zhen Wang: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Shibo Xu: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Chuang Feng: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Xiaohan Yang: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Nannan Wu: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Fuzhen Sun: School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Mathematics, 2022, vol. 10, issue 15, 1-16
Abstract:
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. However, inconsistent distribution and diverse representation make it hard to directly measure the similarity relationship between different modal samples, which causes a heterogeneity gap. To bridge the above-mentioned gap, we propose the deep adversarial learning triplet similarity preserving cross-modal retrieval algorithm to map different modal samples into the common space, allowing their feature representation to preserve both the original inter- and intra-modal semantic similarity relationship. During the training process, we employ GANs, which has advantages in modeling data distribution and learning discriminative representation, in order to learn different modal features. As a result, it can align different modal feature distributions. Generally, many cross-modal retrieval algorithms only preserve the inter-modal similarity relationship, which makes the nearest neighbor retrieval results vulnerable to noise. In contrast, we establish the triplet similarity preserving function to simultaneously preserve the inter- and intra-modal similarity relationship in the common space and in each modal space, respectively. Thus, the proposed algorithm has a strong robustness to noise. In each modal space, to ensure that the generated features have the same semantic information as the sample labels, we establish a linear classifier and require that the generated features’ classification results be consistent with the sample labels. We conducted cross-modal retrieval comparative experiments on two widely used benchmark datasets—Pascal Sentence and Wikipedia. For the image to text task, our proposed method improved the mAP values by 1% and 0.7% on the Pascal sentence and Wikipedia datasets, respectively. Correspondingly, the proposed method separately improved the mAP values of the text to image performance by 0.6% and 0.8% on the Pascal sentence and Wikipedia datasets, respectively. The experimental results show that the proposed algorithm is better than the other state-of-the-art methods.
Keywords: cross-modal retrieval; generative adversarial network; triplet similarity preserving; deep representation learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/15/2585/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/15/2585/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:15:p:2585-:d:871228
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().