A Survey on Evaluation Metrics for Machine Translation

Lee, Seungjun; Lee, Jungseob; Moon, Hyeonseok; Park, Chanjun; Seo, Jaehyung; Eo, Sugyeong; Koo, Seonmin; Lim, Heuiseok

A Survey on Evaluation Metrics for Machine Translation

Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Seonmin Koo and Heuiseok Lim ()
Additional contact information
Seungjun Lee: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Jungseob Lee: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Hyeonseok Moon: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Chanjun Park: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Jaehyung Seo: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Sugyeong Eo: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Seonmin Koo: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
Heuiseok Lim: Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea

Mathematics, 2023, vol. 11, issue 4, 1-22

Abstract: The success of Transformer architecture has seen increased interest in machine translation (MT). The translation quality of neural network-based MT transcends that of translations derived using statistical methods. This growth in MT research has entailed the development of accurate automatic evaluation metrics that allow us to track the performance of MT. However, automatically evaluating and comparing MT systems is a challenging task. Several studies have shown that traditional metrics (e.g., BLEU, TER) show poor performance in capturing semantic similarity between MT outputs and human reference translations. To date, to improve performance, various evaluation metrics have been proposed using the Transformer architecture. However, a systematic and comprehensive literature review on these metrics is still missing. Therefore, it is necessary to survey the existing automatic evaluation metrics of MT to enable both established and new researchers to quickly understand the trend of MT evaluation over the past few years. In this survey, we present the trend of automatic evaluation metrics. To better understand the developments in the field, we provide the taxonomy of the automatic evaluation metrics. Then, we explain the key contributions and shortcomings of the metrics. In addition, we select the representative metrics from the taxonomy, and conduct experiments to analyze related problems. Finally, we discuss the limitation of the current automatic metric studies through the experimentation and our suggestions for further research to improve the automatic evaluation metrics.

Keywords: machine translation; automatic evaluation metric; deep learning; Transformer (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/4/1006/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/4/1006/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:4:p:1006-:d:1070323

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().