EconPapers    
Economics at your fingertips  
 

Comparison of Supervised Classification Models on Textual Data

Bi-Min Hsu
Additional contact information
Bi-Min Hsu: Department of Industrial Engineering & Management, Cheng Shiu University, Kaohsiung 83347, Taiwan

Mathematics, 2020, vol. 8, issue 5, 1-16

Abstract: Text classification is an essential aspect in many applications, such as spam detection and sentiment analysis. With the growing number of textual documents and datasets generated through social media and news articles, an increasing number of machine learning methods are required for accurate textual classification. For this paper, a comprehensive evaluation of the performance of multiple supervised learning models, such as logistic regression (LR), decision trees (DT), support vector machine (SVM), AdaBoost (AB), random forest (RF), multinomial naive Bayes (NB), multilayer perceptrons (MLP), and gradient boosting (GB), was conducted to assess the efficiency and robustness, as well as limitations, of these models on the classification of textual data. SVM, LR, and MLP had better performance in general, with SVM being the best, while DT and AB had much lower accuracies amongst all the tested models. Further exploration on the use of different SVM kernels was performed, demonstrating the advantage of using linear kernels over polynomial, sigmoid, and radial basis function kernels for text classification. The effects of removing stop words on model performance was also investigated; DT performed better with stop words removed, while all other models were relatively unaffected by the presence or absence of stop words.

Keywords: machine learning; text classification; sentiment analysis; IMDb reviews; 20 newsgroups (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://www.mdpi.com/2227-7390/8/5/851/pdf (application/pdf)
https://www.mdpi.com/2227-7390/8/5/851/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:8:y:2020:i:5:p:851-:d:362250

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:8:y:2020:i:5:p:851-:d:362250