Evaluation of Chinese Natural Language Processing System Based on Metamorphic Testing
Lingzi Jin,
Zuohua Ding and
Huihui Zhou
Additional contact information
Lingzi Jin: School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Zuohua Ding: School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Huihui Zhou: School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Mathematics, 2022, vol. 10, issue 8, 1-27
Abstract:
A natural language processing system can realize effective communication between human and computer with natural language. Because its evaluation method relies on a large amount of labeled data and human judgment, the question of how to systematically evaluate its quality is still a challenging task. In this article, we use metamorphic testing technology to evaluate natural language processing systems from the user’s perspective to help users better understand the functionalities of these systems and then select the appropriate natural language processing system according to their specific needs. We have defined three metamorphic relation patterns. These metamorphic relation patterns respectively focus on some characteristics of different aspects of natural language processing. Moreover, on this basis, we defined seven metamorphic relations and chose three tasks (text similarity, text summarization, and text classification) to evaluate the quality of the system. Chinese is used as target language. We extended the defined abstract metamorphic relations to these tasks, and seven specific metamorphic relations were generated for each task. Then, we judged whether the metamorphic relations were satisfied for each task, and used them to evaluate the quality and robustness of the natural language processing system without reference output. We further applied the metamorphic test to three mainstream natural language processing systems (including BaiduCloud API, AliCloud API, and TencentCloud API), and on the PWAS-X datasets, LCSTS datasets, and THUCNews datasets. Experiments were carried out, revealing the advantages and disadvantages of each system. These results further show that the metamorphic test can effectively test the natural language processing system without annotated data.
Keywords: natural language processing; metamorphic testing; quality assessment (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/8/1276/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/8/1276/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:8:p:1276-:d:792080
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().