H-mrk-means: Enhanced Heuristic mrk-means for Linear Time Clustering of Big Data Using Hybrid Meta-heuristic Algorithm
Digvijay Puri and
Deepak Gupta ()
Additional contact information
Digvijay Puri: Department of CSE & IT, Jaypee University of Information Technology, Waknaghat, India
Deepak Gupta: Department of CSE & IT, Jaypee University of Information Technology, Waknaghat, India
Journal of Information & Knowledge Management (JIKM), 2024, vol. 23, issue 04, 1-30
Abstract:
Big data is generally derived with a large volume and combined categories of attributes like categorical and numerical. Among them, k-prototypes have been adopted into MapReduce structure, and thus, it provides a better solution for the huge range of data. However, k-prototypes need to compute all distances among every data point and cluster centres. Moreover, the computations of distances are redundant as data points are often present in similar clusters after fewer iterations. Nowadays, to cluster huge-scale datasets, one of the efficient solutions is k-means. However, k-means is not intrinsically appropriate to execute in MapReduce due to the iterative nature of this technique. Moreover, for every iteration, k-means should perform an independent MapReduce job but, it leads to higher Input/Output (I/O) overhead at every iteration. This research paper presents a novel enhanced linear time clustering for handling big data called Heuristic mrk-means (H-mrk-means) using optimized k-means on the MapReduce model. In order to manage big data that is time series in nature, the sampling and MapReduce framework are adopted, which utilize different machines for processing data. Before initiating the main clustering process, a sampling process is adopted to get the noteworthy information. The two main phases of the developed method are the map phase (divide and conquer) and the reduce phase (final clustering). In the map phase, the data are divided into diverse chunks that should be stored in assigned machines. In the reduce phase, data clustering is performed. Here, the cluster centroid of data is tuned with the help of hybrid Tunicate-Deer Hunting Optimization (T-DHO) algorithm by attaining a newly derived objective function. This type of optimal tuning of solution enhances the efficiency of clustering when compared over normal iterative k-means and mrk-means clustering. The experimental evaluation on varied counts of chunks using the proposed H-mrk-means has attained higher quality of clustering results and faster execution times evaluated with other clustering approaches.
Keywords: Big data; linear time clustering; enhanced heuristic mrk-means algorithm; tunicate-deer hunting optimization; MapReduce (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649224500540
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:23:y:2024:i:04:n:s0219649224500540
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649224500540
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().