Improved Text Clustering Using k-Mean Bayesian Vectoriser
Hanan M. Alghamdi (),
Ali Selamat () and
Nor Shahriza Abdul Karim ()
Additional contact information
Hanan M. Alghamdi: Faculty of Computer Science, Umm Al-Qura University, Al-Gunfdh, Saudi Arabia;
Ali Selamat: Faculty of Computing, Universiti Teknologi Malaysia, UTM Johor Bahru, Johor 81310, Malaysia
Nor Shahriza Abdul Karim: Computer & Information Science Department, Prince Sultan University, 66833 Rafha Street, Riyadh 11586, Saudi Arabia
Journal of Information & Knowledge Management (JIKM), 2014, vol. 13, issue 03, 1-10
Abstract:
In literature studies, high-dimensional data reduces the efficiency of clustering algorithms and maximises execution time. Therefore, in this paper, we propose an approach called a BV-kmeans (Bayesian Vectorisation along with k-means) that aims to improve document representation models for text clustering. This approach consists of integrating the k-means document clustering with the Bayesian Vectoriser that is used to compute the probability distribution of the documents in the vector space in order to overcome the problems of high-dimensional data and lower the consumption time. We have used various similarity measures which are namely: K divergence, Squared Euclidean distance and Squared χ2 distance in order to determine the effective metrics for modelling the similarity between documents with the proposed approach. We have evaluated the proposed approach on a set of common newspaper websites that have highly dimensional data. Experimental results show that the proposed approach can increase the degree to which a cluster encases documents from a specific category by 85%. This is in comparison with the standard k-means algorithm and it has succeeded in lowering the runtime using the proposed approach by 95% compared to the standard k-means algorithm.
Keywords: k-means; naive Bayes; text clustering; Arabic text (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649214500269
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:13:y:2014:i:03:n:s0219649214500269
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649214500269
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().