Classifying insincere questions on Question Answering (QA) websites: meta-textual features and word embedding
Mohammad Al-Ramahi and
Izzat Alsmadi
Journal of Business Analytics, 2021, vol. 4, issue 1, 55-66
Abstract:
The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/2573234X.2021.1895681 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:tjbaxx:v:4:y:2021:i:1:p:55-66
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/tjba20
DOI: 10.1080/2573234X.2021.1895681
Access Statistics for this article
Journal of Business Analytics is currently edited by Dursan Delen
More articles in Journal of Business Analytics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().