EconPapers    
Economics at your fingertips  
 

High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data

Ravil Mussabayev () and Rustam Mussabayev
Additional contact information
Ravil Mussabayev: Department of Mathematics, University of Washington, Padelford Hall C-138, Seattle, WA 98195-4350, USA
Rustam Mussabayev: AI Research Lab, Satbayev University, Satbaev Str. 22, Almaty 050013, Kazakhstan

Mathematics, 2024, vol. 12, issue 13, 1-25

Abstract: This paper introduces a novel formulation of the clustering problem, namely, the minimum sum-of-squares clustering of infinitely tall data (MSSC-ITD), and presents HPClust, an innovative set of hybrid parallel approaches for its effective solution. By utilizing modern high-performance computing techniques, HPClust enhances key clustering metrics: effectiveness, computational efficiency, and scalability. In contrast to vanilla data parallelism, which only accelerates processing time through the MapReduce framework, our approach unlocks superior performance by leveraging the multi-strategy competitive–cooperative parallelism and intricate properties of the objective function landscape. Unlike other available algorithms that struggle to scale, our algorithm is inherently parallel in nature, improving solution quality through increased scalability and parallelism and outperforming even advanced algorithms designed for small- and medium-sized datasets. Our evaluation of HPClust, featuring four parallel strategies, demonstrates its superiority over traditional and cutting-edge methods by offering better performance in the key metrics. These results also show that parallel processing not only enhances the clustering efficiency, but the accuracy as well. Additionally, we explore the balance between computational efficiency and clustering quality, providing insights into optimal parallel strategies based on dataset specifics and resource availability. This research advances our understanding of parallelism in clustering algorithms, demonstrating that a judicious hybridization of advanced parallel approaches yields optimal results for MSSC-ITD. Experiments on the synthetic data further confirm HPClust’s exceptional scalability and robustness to noise.

Keywords: HPClust algorithm; clustering; parallel processing; big data; large-scale datasets; minimum sum-of-squares; decomposition; K-means; K-means++; global optimization; hybrid approach; adaptive algorithm; data sampling; multi-strategy optimization; high-performance computing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/13/1930/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/13/1930/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:13:p:1930-:d:1419783

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1930-:d:1419783