Effectiveness of Heuristic Based Approach on the Performance of Indexing and Clustering of High Dimensional Data
Alan Chen,
Shang Gao,
Tamer N. Jarada,
Ming Zhang,
Christos Pavlatos,
Panagiotis Karampelas and
Reda Alhajj ()
Additional contact information
Alan Chen: Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Shang Gao: Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Tamer N. Jarada: Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Ming Zhang: Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Christos Pavlatos: Department of Information Technology, Hellenic American University, NH, USA
Panagiotis Karampelas: Department of Information Technology, Hellenic American University, NH, USA
Reda Alhajj: Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;
Journal of Information & Knowledge Management (JIKM), 2011, vol. 10, issue 03, 261-271
Abstract:
Data in practical applications (e.g., images, molecular biology, etc) is mostly characterised by high dimensionality and huge size or number of data instances. Though, feature reduction techniques have been successful in reducing the dimensionality for certain applications, dealing with high dimensional data is still an area which has received considerable attention in the research community. Indexing and clustering of high dimensional data are two of the most challenging techniques that have a wide range of applications. However, these techniques suffer from performance issues as the dimensionality and size of the processed data increases. In our effort to tackle this problem, this paper demonstrates a general optimisation technique applicable to indexing and clustering algorithms which need to calculate distances and check them against some minimum distance condition. The optimisation technique is a simple calculation that finds the minimum possible distance between two points, and checks this distance against the minimum distance condition; thus reusing already computed values and reducing the need to compute a more complicated distance function periodically. Effectiveness and usefulness of the proposed optimisation technique has been demonstrated by applying it with successful results to clustering and indexing techniques. We utilised a number of clustering techniques, including the agglomerative hierarchical clustering,k-means clustering, and DBSCAN algorithms. Runtime for all three algorithms with this optimisation scenario was reduced, and the clusters they returned were verified to remain the same as the original algorithms. The optimisation technique also shows potential for reducing runtime by a substantial amount for indexing large databases using NAQ-tree; in addition, the optimisation technique shows potential for reducing runtime as databases grow larger both in dimensionality and size.
Keywords: Clustering algorithms; DBSCAN; density-based clustering; distance computation; hierarchical clustering; k-means clustering; high dimensional indexing; NAQ-tree; performance (search for similar items in EconPapers)
Date: 2011
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649211002973
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:10:y:2011:i:03:n:s0219649211002973
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649211002973
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().