Unsupervised ranking of clustering algorithms by INFOMAX

Sikdar, Sandipan; Mukherjee, Animesh; Marsili, Matteo

Unsupervised ranking of clustering algorithms by INFOMAX

Sandipan Sikdar, Animesh Mukherjee and Matteo Marsili

PLOS ONE, 2020, vol. 15, issue 10, 1-21

Abstract: Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker’s Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms.

Date: 2020
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0239331 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 39331&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0239331

DOI: 10.1371/journal.pone.0239331

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().