EconPapers    
Economics at your fingertips  
 

Improving sentiment analysis using preprocessing techniques and lexical patterns

Stefano Cagnoni, Laura Ferrari, Paolo Fornacciari, Monica Mordonini, Laura Sani and Michele Tomaiuolo

International Journal of Data Analysis Techniques and Strategies, 2021, vol. 13, issue 3, 171-185

Abstract: Sentiment analysis has recently gained considerable attention, since the classification of the emotional content of a text (online reviews, blog messages etc.) may have a relevant impact on market research, political science and many other fields. In this paper, we focus on the importance of the text preprocessing phase, proposing a new technique we termed lexical pattern-based feature weighting (LPFW) that allows one to improve sentence-level sentiment analysis by increasing the relevance of the features contained in particular lexical patterns. This approach has been evaluated on two sentiment classification datasets. We show that a systematic optimisation of the preprocessing filters is important for obtaining good classification accuracy. Also, we show that LPFW is effective in different application domains and with different training set sizes.

Keywords: sentiment analysis; natural language processing; POS tagging; feature weighting; word stemming; bag-of-words representation; tf-idf; Penn Treebank Tagset; support vector machines; naïve Bayes multinomial classifier. (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=118022 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:13:y:2021:i:3:p:171-185

Access Statistics for this article

More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:injdan:v:13:y:2021:i:3:p:171-185