Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems

Mondragón, Julio Cesar Munguía; Lara, Eréndira Rendón; Eleuterio, Roberto Alejo; Gutirrez, Everardo Efrén Granda; López, Federico Del Razo

Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems

Julio Cesar Munguía Mondragón, Eréndira Rendón Lara, Roberto Alejo Eleuterio (), Everardo Efrén Granda Gutirrez and Federico Del Razo López
Additional contact information
Julio Cesar Munguía Mondragón: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico
Eréndira Rendón Lara: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico
Roberto Alejo Eleuterio: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico
Everardo Efrén Granda Gutirrez: University Center at Atlacomulco, Autonomous University of the State of Mexico (UAEMex), Atlacomulco 50400, Estado de Mexico, Mexico
Federico Del Razo López: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico

Mathematics, 2023, vol. 11, issue 18, 1-15

Abstract: In machine learning and data mining applications, an imbalanced distribution of classes in the training dataset can drastically affect the performance of learning models. The class imbalance problem is frequently observed during classification tasks in real-world scenarios when the available instances of one class are much fewer than the amount of data available in other classes. Machine learning algorithms that do not consider the class imbalance could introduce a strong bias towards the majority class, while the minority class is usually despised. Thus, sampling techniques have been extensively used in various studies to overcome class imbalances, mainly based on random undersampling and oversampling methods. However, there is still no final solution, especially in the domain of multi-class problems. A strategy that combines density-based clustering algorithms with random undersampling and oversampling techniques is studied in this work. To analyze the performance of the studied method, an experimental validation was achieved on a collection of hyperspectral remote sensing images, and a deep learning neural network was utilized as the classifier. This data bank contains six datasets with different imbalance ratios, from slight to severe. The experimental results outperform the classification measured by the geometric mean of the precision compared with other state-of-the-art methods, mainly for highly imbalanced datasets.

Keywords: density-based clustering algorithms; sampling methods; deep neural networks (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/18/4008/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/18/4008/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:18:p:4008-:d:1244681

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().