EconPapers    
Economics at your fingertips  
 

The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis

Saqib Alam () and Nianmin Yao ()
Additional contact information
Saqib Alam: Dalian University of Technology
Nianmin Yao: Dalian University of Technology

Computational and Mathematical Organization Theory, 2019, vol. 25, issue 3, No 5, 319-335

Abstract: Abstract Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.

Keywords: Preprocessing; Machine learning; Sentiment analysis; Word2Vec (search for similar items in EconPapers)
Date: 2019
References: View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://link.springer.com/10.1007/s10588-018-9266-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:comaot:v:25:y:2019:i:3:d:10.1007_s10588-018-9266-8

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10588

DOI: 10.1007/s10588-018-9266-8

Access Statistics for this article

Computational and Mathematical Organization Theory is currently edited by Terrill Frantz and Kathleen Carley

More articles in Computational and Mathematical Organization Theory from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:comaot:v:25:y:2019:i:3:d:10.1007_s10588-018-9266-8