POFCM: A Parallel Fuzzy Clustering Algorithm for Large Datasets

Pérez-Ortega, Joaquín; Rey-Figueroa, César David; Roblero-Aguilar, Sandra Silvia; Almanza-Ortega, Nelva Nely; Zavala-Díaz, Crispín; García-Paredes, Salomón; Landero-Nájera, Vanesa

POFCM: A Parallel Fuzzy Clustering Algorithm for Large Datasets

Joaquín Pérez-Ortega (), César David Rey-Figueroa, Sandra Silvia Roblero-Aguilar, Nelva Nely Almanza-Ortega, Crispín Zavala-Díaz, Salomón García-Paredes and Vanesa Landero-Nájera
Additional contact information
Joaquín Pérez-Ortega: Tecnológico Nacional de México/CENIDET, Cuernavaca 62490, Mexico
César David Rey-Figueroa: Tecnológico Nacional de México/CENIDET, Cuernavaca 62490, Mexico
Sandra Silvia Roblero-Aguilar: Tecnológico Nacional de México/CENIDET, Cuernavaca 62490, Mexico
Nelva Nely Almanza-Ortega: Tecnológico Nacional de México/IT de Tlalnepantla, Tlalnepantla de Baz 54070, Mexico
Crispín Zavala-Díaz: Faculty of Accounting, Administration and Informatic, Universidad Autónoma del Estado de Morelos, Cuernavaca 62209, Mexico
Salomón García-Paredes: Tecnológico Nacional de México/IT de Tlalpan, Tlalpan 14500, Mexico
Vanesa Landero-Nájera: Computer Systems, Universidad Politécnica de Apodaca, Apodaca 66600, Mexico

Mathematics, 2023, vol. 11, issue 8, 1-16

Abstract: Clustering algorithms have proven to be a useful tool to extract knowledge and support decision making by processing large volumes of data. Hard and fuzzy clustering algorithms have been used successfully to identify patterns and trends in many areas, such as finance, healthcare, and marketing. However, these algorithms significantly increase their solution time as the size of the datasets to be solved increase, making their use unfeasible. In this sense, the parallel processing of algorithms has proven to be an efficient alternative to reduce their solution time. It has been established that the parallel implementation of algorithms requires its redesign to optimise the hardware resources of the platform that will be used. In this article, we propose a new parallel implementation of the Hybrid OK-Means Fuzzy C-Means (HOFCM) algorithm, which is an efficient variant of Fuzzy C-Means, in OpenMP. An advantage of using OpenMP is its scalability. The efficiency of the implementation is compared against the HOFCM algorithm. The experimental results of processing large real and synthetic datasets show that our implementation tends to more efficiently solve instances with a large number of clusters and dimensions. Additionally, the implementation shows excellent results concerning speedup and parallel efficiency metrics. Our main contribution is a Fuzzy clustering algorithm for large datasets that is scalable and not limited to a specific domain.

Keywords: big data; fuzzy clustering; fuzzy C-means algorithm; OpenMP; parallel computing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/8/1920/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/8/1920/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:8:p:1920-:d:1127317

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().