A lexicon-based term weighting scheme for emotion identification of tweets
S. Lovelyn Rose,
R. Venkatesan,
Girish Pasupathy and
P. Swaradh
International Journal of Data Analysis Techniques and Strategies, 2018, vol. 10, issue 4, 369-380
Abstract:
Detecting emotions in tweets is a huge challenge due to its limited 140 characters and extensive use of twitter language with evolving terms and slangs. This paper uses various preprocessing techniques, forms a feature vector using lexicons and classifies tweets into Paul Ekman's basic emotions namely, happy, sad, anger, fear, disgust and surprise using machine learning. Preprocessing is done using the dictionaries available for emoticons, interjections and slangs and by handling punctuation marks and hashtags. The feature vector is created by combining words from the NRC Emotion lexicon, WordNet-Affect and online thesaurus. Feature vectors are assigned weight based on the presence of punctuations and negations in the feature and the tweets are classified using naive Bayes, SVM and random forests. The use of lexicon features and a novel weighting scheme has produced a considerable gain in terms of accuracy with random forest achieving maximum accuracy of almost 73%.
Keywords: emotion classification; Twitter; preprocessing; feature selection; dictionaries; lexicons; term weighting; random forest; SVM; naive Bayes. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=95216 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:10:y:2018:i:4:p:369-380
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().