EconPapers    
Economics at your fingertips  
 

Multi-Class Text Classification on Khmer News Using Ensemble Method in Machine Learning Algorithms

Raksmey Phann, Chitsutha Soomlek and Pusadee Seresangtakul

Acta Informatica Pragensia, 2023, vol. 2023, issue 2, 243-259

Abstract: The research herein applies text classification with which to categorize Khmer news articles. News articles were collected from three online websites through web scraping and grouped into nine categories. After text preprocessing, the dataset was split into training and testing sets. We then evaluated the performance of the ensemble learning method via machine learning classifiers with k-fold validation. Various machine learning classifiers were employed, namely logistic regression, Complement Naive Bayes, Bernoulli Naive Bayes, k-nearest neighbours, perceptron, support vector machines, stochastic gradient descent, AdaBoost, decision tree, and random forest were employed. Accuracy was improved for the categorization of Khmer news articles, in which Grid Search CV was used to find the optimal hyperparameters for each machine learning classifier with feature extraction TF-IDF and Delta TF-IDF. The results determined that the highest accuracy was achieved through the ensemble learning method in the support vector machine with the optimal hyperparameters (C = 10, kernel = rbf), using feature extraction TF-IDF and Delta TF-IDF, at 83.47% and 83.40%, respectively. The model establishes that Khmer news articles can be accurately categorized.

Keywords: Text classification; Khmer news; Machine learning; Feature extraction; Optimal hyperparameters; News categorization; Ensemble learning method (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://aip.vse.cz/doi/10.18267/j.aip.210.html (text/html)
http://aip.vse.cz/doi/10.18267/j.aip.210.pdf (application/pdf)
free of charge

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:prg:jnlaip:v:2023:y:2023:i:2:id:210:p:243-259

Ordering information: This journal article can be ordered from
Redakce Acta Informatica Pragensia, Katedra systémové analýzy, Vysoká škola ekonomická v Praze, nám. W. Churchilla 4, 130 67 Praha 3
http://aip.vse.cz

DOI: 10.18267/j.aip.210

Access Statistics for this article

Acta Informatica Pragensia is currently edited by Editorial Office

More articles in Acta Informatica Pragensia from Prague University of Economics and Business Contact information at EDIRC.
Bibliographic data for series maintained by Stanislav Vojir ().

 
Page updated 2025-03-19
Handle: RePEc:prg:jnlaip:v:2023:y:2023:i:2:id:210:p:243-259