TED-S: Twitter Event Data in Sports and Politics with Aggregated Sentiments
Hansi Hettiarachchi,
Doaa Al-Turkey,
Mariam Adedoyin-Olowe,
Jagdev Bhogal and
Mohamed Medhat Gaber
Additional contact information
Hansi Hettiarachchi: School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
Doaa Al-Turkey: School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
Mariam Adedoyin-Olowe: School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
Jagdev Bhogal: School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
Mohamed Medhat Gaber: School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
Data, 2022, vol. 7, issue 7, 1-16
Abstract:
Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth labels to build/evaluate such systems. Still, to the best of our knowledge, no available social media dataset covers continuous periods with event and sentiment labels together except for events or sentiments. Datasets without time gaps are huge due to high data generation and require extensive effort for manual labelling. Different approaches, ranging from unsupervised to supervised, have been proposed by previous research targeting such datasets. However, their generic nature mainly fails to capture event-specific sentiment expressions, making them inappropriate for labelling event sentiments. Filling this gap, we propose a novel data annotation approach in this paper involving several neural networks. Our approach outperforms the commonly used sentiment annotation models such as VADER and TextBlob. Also, it generates probability values for all sentiment categories besides providing a single category per tweet, supporting aggregated sentiment analyses. Using this approach, we annotate and release a dataset named TED-S , covering two diverse domains, sports and politics. TED-S has complete subsets of Twitter data streams with both sub-event and sentiment labels, providing the ability to support event sentiment-based research.
Keywords: event detection; sentiment analysis; aggregated sentiments; Twitter; ensembled data annotation (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/7/7/90/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/7/90/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:7:p:90-:d:853435
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().