Efficient similarity-based data clustering by optimal object to cluster reallocation

Rossignol, Mathias; Lagrange, Mathieu; Cont, Arshia

Efficient similarity-based data clustering by optimal object to cluster reallocation

Mathias Rossignol, Mathieu Lagrange and Arshia Cont

PLOS ONE, 2018, vol. 13, issue 6, 1-22

Abstract: We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach permits the relaxing of some conditions on usable affinity matrices like semi-positiveness, as well as opening possibilities for computational optimization required for large datasets. Systematic evaluation on a variety of data sets shows that compared with kernel k-means and the spectral clustering methods, the proposed approach gives equivalent or better performance, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections. Material enabling the reproducibility of the results is made available online.

Date: 2018
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0197450 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 97450&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0197450

DOI: 10.1371/journal.pone.0197450

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().