Towards News Classification, a Machine Learning Approach Based on Stemming and Feature Extraction
Enas Jasim Hadi (),
Mohammed Fadhil Ibrahim and
Ahmed Idan Mohammed ()
Additional contact information
Enas Jasim Hadi: ��Mustansiriyah University, Rasafa, Baghdad, Iraq
Mohammed Fadhil Ibrahim: Middle Technical University, Rasafa, Baghdad, Iraq
Ahmed Idan Mohammed: ��Mustansiriyah University, Rasafa, Baghdad, Iraq
Journal of Information & Knowledge Management (JIKM), 2025, vol. 24, issue 05, 1-22
Abstract:
News is crucial in most communities worldwide since it forms an important information source. News apps provide some privileges to keep individuals in touch with their surrounding events, such as notifications and preferences management that make users prefer news sites and apps over traditional news channels (i.e. TVs). News classification is pivotal for navigating the vast expanse of information in digital platforms. This study introduces a machine learning-based model tailored for Arabic news articles, addressing the complexities inherent in the language’s morphology. Utilising a dataset comprising 250 k articles from the National Iraqi News Agency (NINA), we applied k-nearest neighbour (k-NN) and logistic regression classifiers alongside feature extraction methods such as TF-IDF, Bag-of-Words and n-Gram. We also utilised three stemmers (Khoja, Snowball and Tashaphyne). Our analysis revealed that while both classifiers performed commendably, the k-NN classifier exhibited heightened sensitivity when integrated with n-Gram features. The study’s findings underscore the potential of machine learning in advancing Arabic script-related research, with implications for information retrieval systems and digital news platforms. We rely on (F1 score) metrics to analyse the model’s performance and evaluate the method performance. The evaluation is performed by analysing different classification methods, pre-processing tools and stemmers. The results showed a good classification performance for both classifiers, and there is a noticeable sensitivity for the k-NN classifiers when combined with n-Gram features. Also, Arabic scripts showed a remarkable complexity due to their morphological nature. The model presents outstanding results that indicate a good advancement in Arabic script-related studies.
Keywords: News classification; logistic regression; k -nearest neighbour; stemming (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649225500455
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:24:y:2025:i:05:n:s0219649225500455
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649225500455
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().