OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms

Hasan, MD. Nahid; Sakib, Kazi Shadman; Preeti, Taghrid Tahani; Allohibi, Jeza; Alharbi, Abdulmajeed Atiah; Uddin, Jia

OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms

MD. Nahid Hasan, Kazi Shadman Sakib, Taghrid Tahani Preeti, Jeza Allohibi, Abdulmajeed Atiah Alharbi and Jia Uddin ()
Additional contact information
MD. Nahid Hasan: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh
Kazi Shadman Sakib: Department of Computer Science and Engineering, University of Dhaka, Dhaka 1000, Bangladesh
Taghrid Tahani Preeti: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh
Jeza Allohibi: Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia
Abdulmajeed Atiah Alharbi: Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia
Jia Uddin: Artificial Intelligence and Big Data Department, Endicott College, Woosong University, Daejeon 34606, Republic of Korea

Mathematics, 2024, vol. 12, issue 13, 1-18

Abstract: The pervasiveness of offensive language on social media emphasizes the necessity of automated systems for identifying and categorizing content. To ensure a more secure online environment and improve communication, effective identification and categorization of this content is essential. However, existing research encounters challenges such as limited datasets and biased model performance, hindering progress in this domain. To address these challenges, this research presents a comprehensive framework that simplifies the utilization of support vector machines (SVM), random forest (RF) and artificial neural networks (ANN). The proposed methodology yields notable gains in offensive language detection, automatic categorization of offensiveness, and offense target identification tasks by utilizing the Offensive Language Identification Dataset (OLID). The simulation results indicate that SVM performs exceptionally well, exhibiting excellent accuracy scores (77%, 88%, and 68%), precision scores (76%, 87%, and 67%), F1 scores (57%, 88%, and 68%), and recall rates (45%, 88%, and 68%), proving to be practically successful in identifying and moderating offensive content on social media. By applying sophisticated preprocessing and meticulous hyperparameter tuning, our model outperforms some earlier research in detecting and categorizing offensive language tasks.

Keywords: machine learning; offensive language detection; offensive language categorization; offensive target identification; SVM; random forest; ANN; OLID (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/13/2123/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/13/2123/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:13:p:2123-:d:1430103

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().