Extracting Top Trends from Twitter Discussions in Bulgarian
Boris Bankov ()
Additional contact information
Boris Bankov: University of Economics - Varna
Izvestia Journal of the Union of Scientists - Varna. Economic Sciences Series, 2017, issue 2, 254-259
Abstract:
Social networks offer plenty opportunities and areas for scientific research to dabble in user opinion mining and text analysis. The short text messages that get posted online present unique challenges related to automatic categorization and annotation. An interesting problem is the natural language filtering of text messages. Due to the huge volumes and sparsity of textual data machine learning algorithms are being applied. In this paper we take a look at the way to extract twitter messages in real-time containing Bulgarian texts. We also measure Twitter`s accuracy in terms of language identification from a 10 day dataset between 1st and 10th of October 2017. We propose a step by step text preprocessing algorithm, suitable for sanitizing tweets. We apply kmeans++ algorithm to cluster the extracted data and choose representative words for each cluster during each day.
Keywords: twitter text mining; text clustering; social media; data mining; bulgarian text mining; bulgarian text clustering (search for similar items in EconPapers)
JEL-codes: A00 (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.su-varna.org/izdanij/2017/ikonomika-017-2/254-259.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vra:journl:y:2017:i:2:p:254-259
Access Statistics for this article
More articles in Izvestia Journal of the Union of Scientists - Varna. Economic Sciences Series from Union of Scientists - Varna, Economic Sciences Section Contact information at EDIRC.
Bibliographic data for series maintained by Pavel Petrov ().