OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms
MD. Nahid Hasan,
Kazi Shadman Sakib,
Taghrid Tahani Preeti,
Jeza Allohibi,
Abdulmajeed Atiah Alharbi and
Jia Uddin ()
Additional contact information
MD. Nahid Hasan: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh
Kazi Shadman Sakib: Department of Computer Science and Engineering, University of Dhaka, Dhaka 1000, Bangladesh
Taghrid Tahani Preeti: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh
Jeza Allohibi: Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia
Abdulmajeed Atiah Alharbi: Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia
Jia Uddin: Artificial Intelligence and Big Data Department, Endicott College, Woosong University, Daejeon 34606, Republic of Korea
Mathematics, 2024, vol. 12, issue 13, 1-18
Abstract:
The pervasiveness of offensive language on social media emphasizes the necessity of automated systems for identifying and categorizing content. To ensure a more secure online environment and improve communication, effective identification and categorization of this content is essential. However, existing research encounters challenges such as limited datasets and biased model performance, hindering progress in this domain. To address these challenges, this research presents a comprehensive framework that simplifies the utilization of support vector machines (SVM), random forest (RF) and artificial neural networks (ANN). The proposed methodology yields notable gains in offensive language detection, automatic categorization of offensiveness, and offense target identification tasks by utilizing the Offensive Language Identification Dataset (OLID). The simulation results indicate that SVM performs exceptionally well, exhibiting excellent accuracy scores (77%, 88%, and 68%), precision scores (76%, 87%, and 67%), F1 scores (57%, 88%, and 68%), and recall rates (45%, 88%, and 68%), proving to be practically successful in identifying and moderating offensive content on social media. By applying sophisticated preprocessing and meticulous hyperparameter tuning, our model outperforms some earlier research in detecting and categorizing offensive language tasks.
Keywords: machine learning; offensive language detection; offensive language categorization; offensive target identification; SVM; random forest; ANN; OLID (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/13/2123/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/13/2123/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:13:p:2123-:d:1430103
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().