A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets
Ola Karajeh,
Dirar Darweesh,
Omar Darwish,
Noor Abu-El-Rub,
Belal Alsinglawi and
Nasser Alsaedi
Additional contact information
Ola Karajeh: Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
Dirar Darweesh: Department of Computer Science, Jordan University of Science and Technology, 3030 Irbid, Jordan
Omar Darwish: Computer Technology and Information Systems, Ferrum College, Ferrum, VA 24088, USA
Noor Abu-El-Rub: Kansas Medical Center, Kansas City, MO 67002, USA
Belal Alsinglawi: School of Computer Data and Mathematical Sciences, Western Sydney University, Rydalmere, NSW 2116, Australia
Nasser Alsaedi: Department of Computer Science, Taibah University, 2003 Medina, Saudi Arabia
Future Internet, 2021, vol. 13, issue 1, 1-10
Abstract:
Social media sites are considered one of the most important sources of data in many fields, such as health, education, and politics. While surveys provide explicit answers to specific questions, posts in social media have the same answers implicitly occurring in the text. This research aims to develop a method for extracting implicit answers from large tweet collections, and to demonstrate this method for an important concern: the problem of heart attacks. The approach is to collect tweets containing “heart attack” and then select from those the ones with useful information. Informational tweets are those which express real heart attack issues, e.g., “Yesterday morning, my grandfather had a heart attack while he was walking around the garden.” On the other hand, there are non-informational tweets such as “Dropped my iPhone for the first time and almost had a heart attack.” The starting point was to manually classify around 7000 tweets as either informational (11%) or non-informational (89%), thus yielding a labeled dataset to use in devising a machine learning classifier that can be applied to our large collection of over 20 million tweets. Tweets were cleaned and converted to a vector representation, suitable to be fed into different machine-learning algorithms: Deep neural networks, support vector machine (SVM), J48 decision tree and naïve Bayes. Our experimentation aimed to find the best algorithm to use to build a high-quality classifier. This involved splitting the labeled dataset, with 2/3 used to train the classifier and 1/3 used for evaluation besides cross-validation methods. The deep neural network (DNN) classifier obtained the highest accuracy (95.2%). In addition, it obtained the highest F1-scores with (73.6%) and (97.4%) for informational and non-informational classes, respectively.
Keywords: machine learning; classification; support vector machine; deep neural networks; tweets; heart attack; health (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.mdpi.com/1999-5903/13/1/19/pdf (application/pdf)
https://www.mdpi.com/1999-5903/13/1/19/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:13:y:2021:i:1:p:19-:d:481363
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().