Investigation of Predictive Power of Sentiment Analysis Model Developed Using Different Word Embedding Techniques

Guru, Sudhanshu Kumar; Kumar, Lov

Investigation of Predictive Power of Sentiment Analysis Model Developed Using Different Word Embedding Techniques

Sudhanshu Kumar Guru () and Lov Kumar ()
Additional contact information
Sudhanshu Kumar Guru: Micron Technology
Lov Kumar: NIT Kurukshetra

Chapter Chapter 2 in Data-Driven Decision Making, 2024, pp 27-58 from Springer

Abstract: Abstract In the area of text mining, sentiment analysis is very powerful technique to sense the overall emotion or sentiment behind huge set of text. Sentiment analysis helps in observing opinion about any product, topic, policy, etc., from thousands and thousands of online reviews, twits, social media comments, hashtags, etc. In the area of Software Engineering (SE) also this technique is being explored and found to be an interesting way to observe the opinion of developers regarding new set of APIs, code library or even a bug on blogging websites like StackOverflow.com or bug tracking tool like jira. There are already some popular tools available to perform the sentiment analysis on SE texts like SentiStrength, EmoTxt, Vader (NLTK), etc. Most of these use word dictionary which gives positive/negative score for the words. In this project/paper empirical analysis of various word embedding techniques in SE text is performed which are collected from 3 different sources StackOverflow.com, jira and app reviews. Since algorithms take vectors of numbers therefore, SE text has to be converted into vectors of numbers. There are 6 different word embedding techniques (Count Vectorization, TF-IDF, Word2Vec-CBOW & Skip-gram, Glove and Word2Vec pretrained on google news feed) used to convert the input texts into vectors and compared the results and found Word2Vec (pretrained on Google News corpus feed) and Glove are performing almost similar and better than other techniques. In this paper 3 different feature selection/reduction techniques are used: Significant Feature (SF) Selection, Significant Predictor Feature (SPF) Selection and Principal Component Analysis (PCA) and again comparative analysis is performed and found SPF and SF are producing very close result. Finally, 8 different Machine Learning model techniques are used to study the sentiment analysis and an empirical analysis has been performed to identify the best ML method in terms of accuracy and cost. Through this study our motive is to explore which word embedding technique in combination with feature reduction and ML model is best suitable for SE-related text’s sentiment analysis.

Keywords: Sentiment analysis; Word embedding; Software Engineering; Machine Learning (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-981-97-2902-9_2

Ordering information: This item can be ordered from
http://www.springer.com/9789819729029

DOI: 10.1007/978-981-97-2902-9_2

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().