Identification of Review Helpfulness Using Novel Textual and Language-Context Features
Muhammad Shehrayar Khan,
Atif Rizwan,
Muhammad Shahzad Faisal,
Tahir Ahmad,
Muhammad Saleem Khan and
Ghada Atteia ()
Additional contact information
Muhammad Shehrayar Khan: Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan
Atif Rizwan: Department of Computer Engineering, Jeju National University, Jeju-si 63243, Korea
Muhammad Shahzad Faisal: Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan
Tahir Ahmad: Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan
Muhammad Saleem Khan: Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan
Ghada Atteia: Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
Mathematics, 2022, vol. 10, issue 18, 1-20
Abstract:
With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.
Keywords: neural network; Word2Vec; Natural Language Processing; sentiment classification (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/18/3260/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/18/3260/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:18:p:3260-:d:909490
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().