Evaluating and Enhancing the Robustness of Sustainable Neural Relationship Classifiers Using Query-Efficient Black-Box Adversarial Attacks

Haq, Ijaz Ul; Khan, Zahid Younas; Ahmad, Arshad; Hayat, Bashir; Khan, Asif; Lee, Ye-Eun; Kim, Ki-Il

Evaluating and Enhancing the Robustness of Sustainable Neural Relationship Classifiers Using Query-Efficient Black-Box Adversarial Attacks

Ijaz Ul Haq, Zahid Younas Khan, Arshad Ahmad, Bashir Hayat, Asif Khan, Ye-Eun Lee and Ki-Il Kim
Additional contact information
Ijaz Ul Haq: School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100811, China
Zahid Younas Khan: School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100811, China
Arshad Ahmad: Department of IT and Computer Science Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur 22620, Pakistan
Bashir Hayat: Institute of Management Sciences Peshawar, Peshawar 25100, Pakistan
Asif Khan: School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100811, China
Ye-Eun Lee: Department of Computer Science and Engineering, Chungnam National University, Daejeon 34134, Korea
Ki-Il Kim: Department of Computer Science and Engineering, Chungnam National University, Daejeon 34134, Korea

Sustainability, 2021, vol. 13, issue 11, 1-25

Abstract: Neural relation extraction (NRE) models are the backbone of various machine learning tasks, including knowledge base enrichment, information extraction, and document summarization. Despite the vast popularity of these models, their vulnerabilities remain unknown; this is of high concern given their growing use in security-sensitive applications such as question answering and machine translation in the aspects of sustainability. In this study, we demonstrate that NRE models are inherently vulnerable to adversarially crafted text that contains imperceptible modifications of the original but can mislead the target NRE model. Specifically, we propose a novel sustainable term frequency-inverse document frequency (TFIDF) based black-box adversarial attack to evaluate the robustness of state-of-the-art CNN, CGN, LSTM, and BERT-based models on two benchmark RE datasets. Compared with white-box adversarial attacks, black-box attacks impose further constraints on the query budget; thus, efficient black-box attacks remain an open problem. By applying TFIDF to the correctly classified sentences of each class label in the test set, the proposed query-efficient method achieves a reduction of up to 70% in the number of queries to the target model for identifying important text items. Based on these items, we design both character- and word-level perturbations to generate adversarial examples. The proposed attack successfully reduces the accuracy of six representative models from an average F1 score of 80% to below 20%. The generated adversarial examples were evaluated by humans and are considered semantically similar. Moreover, we discuss defense strategies that mitigate such attacks, and the potential countermeasures that could be deployed in order to improve sustainability of the proposed scheme.

Keywords: robust; sustainability; adversarial attack; black-box attack; TFIDF; relation extraction; deep neural networks (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2071-1050/13/11/5892/pdf (application/pdf)
https://www.mdpi.com/2071-1050/13/11/5892/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:13:y:2021:i:11:p:5892-:d:560896

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().