EconPapers    
Economics at your fingertips  
 

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

Yousif A. Alhaj, Abdelghani Dahou, Mohammed A. A. Al-qaness, Laith Abualigah, Aaqif Afzaal Abbasi, Nasser Ahmed Obad Almaweri, Mohamed Abd Elaziz and Robertas Damaševičius
Additional contact information
Yousif A. Alhaj: Sanaa Community College, Sanaa 5695, Yemen
Abdelghani Dahou: Mathematics and Computer Science Department, Ahmed Draia University, Adrar 01000, Algeria
Mohammed A. A. Al-qaness: State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
Laith Abualigah: Faculty of Information Technology, Middle East University, Amman 11831, Jordan
Aaqif Afzaal Abbasi: Department of Software Engineering, Foundation University Islamabad, Islamabad 44000, Pakistan
Nasser Ahmed Obad Almaweri: Sanaa Community College, Sanaa 5695, Yemen
Mohamed Abd Elaziz: Faculty of Computer Science and Engineering, Galala University, Suez 435611, Egypt
Robertas Damaševičius: Department of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, Lithuania

Future Internet, 2022, vol. 14, issue 7, 1-18

Abstract: We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.

Keywords: text classification; feature selection; feature extraction; particle swarm optimization (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/7/194/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/7/194/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:7:p:194-:d:848579

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jftint:v:14:y:2022:i:7:p:194-:d:848579