EconPapers    
Economics at your fingertips  
 

Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification

Mayur Gaikwad, Swati Ahirrao, Shraddha Phansalkar and Ketan Kotecha
Additional contact information
Mayur Gaikwad: Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune MH 412115, India
Swati Ahirrao: Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune MH 412115, India
Shraddha Phansalkar: MIT Art, Design and Technology University, Pune MH 412201, India
Ketan Kotecha: Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune MH 412115, India

Data, 2021, vol. 6, issue 11, 1-15

Abstract: Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on social media leads to an increased number of susceptible audiences. Hence, detection of extremist text on social media platforms is a significant area of research. The unavailability of extremism text datasets is a challenge in online extremism research. The lack of emphasis on classifying extremism text into propaganda, radicalization, and recruitment classes is a challenge. The lack of data validation methods also challenges the accuracy of extremism detection. This research addresses these challenges and presents a seed dataset with a multi-ideology and multi-class extremism text dataset. This research presents the construction of a multi-ideology ISIS/Jihadist White supremacist (MIWS) dataset with recent tweets collected from Twitter. The presented dataset can be employed effectively and importantly to classify extremist text into popular types like propaganda, radicalization, and recruitment. Additionally, the seed dataset is statistically validated with a coherence score of Latent Dirichlet Allocation (LDA) and word mover’s distance using a pretrained Google News vector. The dataset shows effectiveness in its construction with good coherence scores within a topic and appropriate distance measures between topics. This dataset is the first publicly accessible multi-ideology, multi-class extremism text dataset to reinforce research on extremism text detection on social media platforms.

Keywords: artificial intelligence; extremism; disinformation; ideology; propaganda; radicalization; recruitment (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/6/11/117/pdf (application/pdf)
https://www.mdpi.com/2306-5729/6/11/117/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:6:y:2021:i:11:p:117-:d:679373

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:6:y:2021:i:11:p:117-:d:679373