Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data

Ahmad, Pir Noman; Liu, Yuanchao; Ali, Gauhar; Wani, Mudasir Ahmad; ElAffendi, Mohammed

Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data

Pir Noman Ahmad (), Yuanchao Liu, Gauhar Ali, Mudasir Ahmad Wani () and Mohammed ElAffendi
Additional contact information
Pir Noman Ahmad: School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Yuanchao Liu: School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Gauhar Ali: EIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
Mudasir Ahmad Wani: EIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
Mohammed ElAffendi: EIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia

Mathematics, 2023, vol. 11, issue 12, 1-23

Abstract: Social media, fake news, and different propaganda strategies have all contributed to an increase in misinformation online during the past ten years. As a result of the scarcity of high-quality data, the present datasets cannot be used to train a deep-learning model, making it impossible to establish an identification. We used a natural language processing approach to the issue in order to create a system that uses deep learning to automatically identify propaganda in news items. To assist the scholarly community in identifying propaganda in text news, this study suggested the propaganda texts (ProText) library. Truthfulness labels are assigned to ProText repositories after being manually and automatically verified with fact-checking methods. Additionally, this study proposed using a fine-tuned Robustly Optimized BERT Pre-training Approach (RoBERTa) and word embedding using multi-label multi-class text classification. Through experimentation and comparative research analysis, we address critical issues and collaborate to discover answers. We achieved an evaluation performance accuracy of 90%, 75%, 68%, and 65% on ProText, PTC, TSHP-17, and Qprop, respectively. The big-data method, particularly with deep-learning models, can assist us in filling out unsatisfactory big data in a novel text classification strategy. We urge collaboration to inspire researchers to acquire, exchange datasets, and develop a standard aimed at organizing, labeling, and fact-checking.

Keywords: misinformation; propaganda; fact-check; ProText; big data; social media (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/12/2668/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/12/2668/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:12:p:2668-:d:1169221

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().