Bilingual–Visual Consistency for Multimodal Neural Machine Translation
Yongwen Liu (),
Dongqing Liu and
Shaolin Zhu
Additional contact information
Yongwen Liu: College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
Dongqing Liu: National Engineering Laboratory for Internet Medical Systems and Applications, Zhengzhou University, Zhengzhou 450052, China
Shaolin Zhu: College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
Mathematics, 2024, vol. 12, issue 15, 1-18
Abstract:
Current multimodal neural machine translation (MNMT) approaches primarily focus on ensuring consistency between visual annotations and the source language, often overlooking the broader aspect of multimodal coherence, including target–visual and bilingual–visual alignment. In this paper, we propose a novel approach that effectively leverages target–visual consistency (TVC) and bilingual–visual consistency (BiVC) to improve MNMT performance. Our method leverages visual annotations depicting concepts across bilingual parallel sentences to enhance multimodal coherence in translation. We exploit target–visual harmony by extracting contextual cues from visual annotations during auto-regressive decoding, incorporating vital future context to improve target sentence representation. Additionally, we introduce a consistency loss promoting semantic congruence between bilingual sentence pairs and their visual annotations, fostering a tighter integration of textual and visual modalities. Extensive experiments on diverse multimodal translation datasets empirically demonstrate our approach’s effectiveness. This visually aware, data-driven framework opens exciting opportunities for intelligent learning, adaptive control, and robust distributed optimization of multi-agent systems in uncertain, complex environments. By seamlessly fusing multimodal data and machine learning, our method paves the way for novel control paradigms capable of effectively handling the dynamics and constraints of real-world multi-agent applications.
Keywords: multi-modal neural machine translation; bilingual-visual harmony; visual annotation (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/15/2361/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/15/2361/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:15:p:2361-:d:1445040
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().