Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text

Bansal, Neetika; Goyal, Vishal; Rani, Simpel

Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text

Neetika Bansal, Vishal Goyal and Simpel Rani
Additional contact information
Neetika Bansal: College of Engineering & Management, India
Vishal Goyal: Punjabi University, India
Simpel Rani: Yadavindra College of Engineering, India

International Journal of E-Adoption (IJEA), 2020, vol. 12, issue 1, 52-62

Abstract: People do not always use Unicode, rather, they mix multiple languages. The processing of codemixed data becomes challenging due to the linguistic complexities. The noisy text increases the complexities of language identification. The dataset used in this article contains Facebook and Twitter messages collected through Facebook graph API and twitter API. The annotated English Punjabi code mixed dataset has been trained using a pipeline Dictionary Vectorizer, N-gram approach with some features. Furthermore, classifiers used are Logistic Regression, Decision Tree Classifier and Gaussian Naïve Bayes are used to perform language identification at word level. The results show that Logistic Regression performs best with an accuracy of 86.63 with an F-1 measure of 0.88. The success of machine learning approaches depends on the quality of labeled corpora.

Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
https://services.igi-global.com/resolvedoi/resolve ... 4018/IJEA.2020010105 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jea000:v:12:y:2020:i:1:p:52-62

Access Statistics for this article

International Journal of E-Adoption (IJEA) is currently edited by Hayden Wimmer

More articles in International Journal of E-Adoption (IJEA) from IGI Global Scientific Publishing
Bibliographic data for series maintained by Journal Editor ().