Assessing clustering methods using Shannon's entropy
Anis Hoayek and
Didier Rullière ()
Additional contact information
Anis Hoayek: LIMOS - Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes - ENSM ST-ETIENNE - Ecole Nationale Supérieure des Mines de St Etienne - CNRS - Centre National de la Recherche Scientifique - UCA - Université Clermont Auvergne - INP Clermont Auvergne - Institut national polytechnique Clermont Auvergne - UCA - Université Clermont Auvergne, FAYOL-ENSMSE - Institut Henri Fayol - Mines Saint-Étienne MSE - École des Mines de Saint-Étienne - IMT - Institut Mines-Télécom [Paris], FAYOL-ENSMSE - Département Génie mathématique et industriel - ENSM ST-ETIENNE - Ecole Nationale Supérieure des Mines de St Etienne - Institut Henri Fayol, Mines Saint-Étienne MSE - École des Mines de Saint-Étienne - IMT - Institut Mines-Télécom [Paris]
Didier Rullière: LIMOS - Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes - ENSM ST-ETIENNE - Ecole Nationale Supérieure des Mines de St Etienne - CNRS - Centre National de la Recherche Scientifique - UCA - Université Clermont Auvergne - INP Clermont Auvergne - Institut national polytechnique Clermont Auvergne - UCA - Université Clermont Auvergne, FAYOL-ENSMSE - Institut Henri Fayol - Mines Saint-Étienne MSE - École des Mines de Saint-Étienne - IMT - Institut Mines-Télécom [Paris], FAYOL-ENSMSE - Département Génie mathématique et industriel - ENSM ST-ETIENNE - Ecole Nationale Supérieure des Mines de St Etienne - Institut Henri Fayol, Mines Saint-Étienne MSE - École des Mines de Saint-Étienne - IMT - Institut Mines-Télécom [Paris]
Post-Print from HAL
Abstract:
Unsupervised clustering techniques are a valuable source of information for determining how to divide a dataset into subgroups. We present a comprehensive analysis of the quality of these algorithms by defining a clustering fuzziness metric. A statistical test and cluster probabilities corrections are provided based on this metric. Some examples demonstrate how it can be used to compare different clustering algorithms or improve the accuracy of various methods. An application for adjusting the number of clusters is also presented. These results are illustrated using both simulated and real-world data.
Date: 2024-09-27
Note: View the original document on HAL open archive server: https://hal.science/hal-03812055v2
References: View references in EconPapers View complete reference list from CitEc
Citations:
Published in Information Sciences, 2024, 689, pp.121510. ⟨10.1016/j.ins.2024.121510⟩
Downloads: (external link)
https://hal.science/hal-03812055v2/document (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-03812055
DOI: 10.1016/j.ins.2024.121510
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().