Neural Machine Translation with CARU-Embedding Layer and CARU-Gated Attention Layer

Im, Sio-Kei; Chan, Ka-Hou

Neural Machine Translation with CARU-Embedding Layer and CARU-Gated Attention Layer

Sio-Kei Im and Ka-Hou Chan ()
Additional contact information
Sio-Kei Im: Faculty of Applied Sciences, Macao Polytechnic University, Macau, China
Ka-Hou Chan: Faculty of Applied Sciences, Macao Polytechnic University, Macau, China

Mathematics, 2024, vol. 12, issue 7, 1-19

Abstract: The attention mechanism performs well for the Neural Machine Translation (NMT) task, but heavily depends on the context vectors generated by the attention network to predict target words. This reliance raises the issue of long-term dependencies. Indeed, it is very common to combine predicates with postpositions in sentences, and the same predicate may have different meanings when combined with different postpositions. This usually poses an additional challenge to the NMT study. In this work, we observe that the embedding vectors of different target tokens can be classified by part-of-speech, thus we analyze the Natural Language Processing (NLP) related Content-Adaptive Recurrent Unit (CARU) unit and apply it to our attention model (CAAtt) and embedding layer (CAEmbed). By encoding the source sentence with the current decoded feature through the CARU, CAAtt is capable of achieving translation content-adaptive representations, which attention weights are contributed and enhanced by our proposed L 1 exp N x normalization. Furthermore, CAEmbed aims to alleviate long-term dependencies in the target language through partial recurrent design, performing the feature extraction in a local perspective. Experiments on the WMT14, WMT17, and Multi30k translation tasks show that the proposed model achieves improvements in BLEU scores and enhancement of convergence over the attention-based plain NMT model. We also investigate the attention weights generated by the proposed approaches, which indicate that refinement over the different combinations of adposition can lead to different interpretations. Specifically, this work provides local attention to some specific phrases translated in our experiment. The results demonstrate that our approach is effective in improving performance and achieving a more reasonable attention distribution compared to the state-of-the-art models.

Keywords: neural network; Neural Machine Translation (NMT); Natural Language Processing (NLP); attention mechanism; Content-Adaptive Recurrent Unit (CARU) (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/7/997/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/7/997/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:7:p:997-:d:1364975

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().