A model for sentiment and emotion analysis of unstructured social media text

Rout, Jitendra Kumar; Choo, Kim-Kwang Raymond; Dash, Amiya Kumar; Bakshi, Sambit; Jena, Sanjay Kumar; Williams, Karen L.

A model for sentiment and emotion analysis of unstructured social media text

Jitendra Kumar Rout (), Kim-Kwang Raymond Choo (), Amiya Kumar Dash (), Sambit Bakshi (), Sanjay Kumar Jena () and Karen L. Williams ()
Additional contact information
Jitendra Kumar Rout: National Institute of Technology
Kim-Kwang Raymond Choo: University of Texas at San Antonio
Amiya Kumar Dash: National Institute of Technology
Sambit Bakshi: National Institute of Technology
Sanjay Kumar Jena: National Institute of Technology
Karen L. Williams: University of Texas at San Antonio

Electronic Commerce Research, 2018, vol. 18, issue 1, No 10, 199 pages

Abstract: Abstract Sentiment analysis has applications in diverse contexts such as in the gathering and analysis of opinions of individuals about various products, issues, social, and political events. Understanding public opinion can help improve decision making. Opinion mining is a way of retrieving information via search engines, blogs, microblogs and social networks. Individual opinions are unique to each person, and Twitter tweets are an invaluable source of this type of data. However, the huge volume and unstructured nature of text/opinion data pose a challenge to analyzing the data efficiently. Accordingly, proficient algorithms/computational strategies are required for mining and condensing tweets as well as finding sentiment bearing words. Most existing computational methods/models/algorithms in the literature for identifying sentiments from such unstructured data rely on machine learning techniques with the bag-of-word approach as their basis. In this work, we use both unsupervised and supervised approaches on various datasets. Unsupervised approach is being used for the automatic identification of sentiment for tweets acquired from Twitter public domain. Different machine learning algorithms such as Multinomial Naive Bayes (MNB), Maximum Entropy and Support Vector Machines are applied for sentiment identification of tweets as well as to examine the effectiveness of various feature combinations. In our experiment on tweets, we achieve an accuracy of 80.68% using the proposed unsupervised approach, in comparison to the lexicon based approach (the latter gives an accuracy of 75.20%). In our experiments, the supervised approach where we combine unigram, bigram and Part-of-Speech as feature is efficient in finding emotion and sentiment of unstructured data. For short message services, using the unigram feature with MNB classifier allows us to achieve an accuracy of 67%.

Keywords: Sentiment analysis; Bag-of-words; Lexicon; Laplace smoothing; Parts-of-Speech (POS); Machine learning (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (13)

Downloads: (external link)
http://link.springer.com/10.1007/s10660-017-9257-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:elcore:v:18:y:2018:i:1:d:10.1007_s10660-017-9257-8

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10660

DOI: 10.1007/s10660-017-9257-8

Access Statistics for this article

Electronic Commerce Research is currently edited by James Westland

More articles in Electronic Commerce Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().