EconPapers    
Economics at your fingertips  
 

EWNStream+: Effective and Real-time Clustering of Short Text Streams Using Evolutionary Word Relation Network

Shuiqiao Yang (), Guangyan Huang, Xiangmin Zhou (), Vicky Mak () and John Yearwood ()
Additional contact information
Shuiqiao Yang: Data Science Institute, University of Technology Sydney, Ultimo, New South Wales 2007, Australia
Guangyan Huang: School of Information Technology, Deakin University, Burwood, Victoria 3125, Australia
Xiangmin Zhou: School of Computer Science and Information Technology, RMIT University, Melbourne, Victoria 3000, Australia
Vicky Mak: School of Information Technology, Deakin University, Burwood, Victoria 3125, Australia
John Yearwood: School of Information Technology, Deakin University, Burwood, Victoria 3125, Australia

International Journal of Information Technology & Decision Making (IJITDM), 2021, vol. 20, issue 01, 341-370

Abstract: The real-time clustering of short text streams has various applications, such as event tracking, text summarization and sentimental analysis. However, accurately and efficiently clustering short text streams is challenging due to the sparsity problem (i.e., the limited information comprised in a single short text document leads to high-dimensional and sparse vectors when we represent short texts using traditional vector space models), topic drift and the fast generated text streams. In this paper, we provide an effective and real-time Evolutionary Word relation Network for short text streams clustering (EWNStream+) method. The EWNStream+ method constructs a bi-weighted word relation network using the aggregated term frequencies and term co-occurrence statistics at corpus level to overcome the sparsity problem and topic drift of short texts. Better still, as the query window in the stream shifts to the newly arriving data, EWNStream+ is capable of incrementally updating the word relation network by incorporating new word statistics and decaying the old ones to naturally capture the underlying topic drift in the data streams and reduce the size of the network. The experimental results on a real-world dataset show that EWNStream+ can achieve better clustering accuracy and time efficiency than several counterpart methods.

Keywords: Short text stream; clustering; topic discovery; event detection (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219622021500024
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:ijitdm:v:20:y:2021:i:01:n:s0219622021500024

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219622021500024

Access Statistics for this article

International Journal of Information Technology & Decision Making (IJITDM) is currently edited by Yong Shi

More articles in International Journal of Information Technology & Decision Making (IJITDM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:ijitdm:v:20:y:2021:i:01:n:s0219622021500024