Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text
Reem ALBayari and
Sherief Abdallah
Additional contact information
Reem ALBayari: Higher College of Technology, Abu Dhabi P.O. Box 25026, United Arab Emirates
Sherief Abdallah: Faculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab Emirates
Data, 2022, vol. 7, issue 7, 1-11
Abstract:
(1) Background: the ability to use social media to communicate without revealing one’s real identity has created an attractive setting for cyberbullying. Several studies targeted social media to collect their datasets with the aim of automatically detecting offensive language. However, the majority of the datasets were in English, not in Arabic. Even the few Arabic datasets that were collected, none focused on Instagram despite being a major social media platform in the Arab world. (2) Methods: we use the official Instagram APIs to collect our dataset. To consider the dataset as a benchmark, we use SPSS (Kappa statistic) to evaluate the inter-annotator agreement (IAA), as well as examine and evaluate the performance of various learning models (LR, SVM, RFC, and MNB). (3) Results: in this research, we present the first Instagram Arabic corpus (sub-class categorization (multi-class)) focusing on cyberbullying. The dataset is primarily designed for the purpose of detecting offensive language in texts. We end up with 200,000 comments, of which 46,898 comments were annotated by three human annotators. The results show that the SVM classifier outperforms the other classifiers, with an F1 score of 69% for bullying comments and 85 percent for positive comments.
Keywords: cyberbullying; offensive language; Arabic dialect (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/7/7/83/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/7/83/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:7:p:83-:d:845163
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().