Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation

Zhang, Wenbo; Li, Xiao; Yang, Yating; Dong, Rui; Luo, Gongxu

Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation

Wenbo Zhang, Xiao Li, Yating Yang, Rui Dong and Gongxu Luo
Additional contact information
Wenbo Zhang: Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
Xiao Li: Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
Yating Yang: Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
Rui Dong: Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
Gongxu Luo: Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China

Future Internet, 2020, vol. 12, issue 12, 1-13

Abstract: Recently, the pretraining of models has been successfully applied to unsupervised and semi-supervised neural machine translation. A cross-lingual language model uses a pretrained masked language model to initialize the encoder and decoder of the translation model, which greatly improves the translation quality. However, because of a mismatch in the number of layers, the pretrained model can only initialize part of the decoder’s parameters. In this paper, we use a layer-wise coordination transformer and a consistent pretraining translation transformer instead of a vanilla transformer as the translation model. The former has only an encoder, and the latter has an encoder and a decoder, but the encoder and decoder have exactly the same parameters. Both models can guarantee that all parameters in the translation model can be initialized by the pretrained model. Experiments on the Chinese–English and English–German datasets show that compared with the vanilla transformer baseline, our models achieve better performance with fewer parameters when the parallel corpus is small.

Keywords: low-resource neural machine translation; monolingual data; pretraining; transformer (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/12/12/215/pdf (application/pdf)
https://www.mdpi.com/1999-5903/12/12/215/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:12:y:2020:i:12:p:215-:d:452591

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().