Clustering
Vladimir Shikhman () and
David Müller ()
Additional contact information
Vladimir Shikhman: Chemnitz University of Technology
David Müller: Chemnitz University of Technology
Chapter 5 in Mathematical Foundations of Big Data Analytics, 2021, pp 87-105 from Springer
Abstract:
Abstract Clustering aims to group a set of objects in such a way that objects within one and the same cluster are more similar to each other than to those in other clusters. Depending on the objects’ features, the clustering of DNA sequences of genes, members within a social network, texts written in natural languages, time series of stock prices, medical images from computer tomography, or consumer products on e-commerce platforms, may become relevant. Clustering by itself is not a specific algorithm, but rather a task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently identify them. In this chapter, we shall present the celebrated k-means clustering based on a general dissimilarity measure between the objects. In the first step, the algorithm assigns each object to the cluster with the least dissimilar center. In the second step, the centers are recalculated by minimizing the dissimilarity within the clusters. The k-means algorithm is specified for the Euclidean setup, where centers turn out to be clusters’ sample means. Additionally, we discuss the modifications of k-means with respect to other dissimilarity measures. They include Levenshtein distance, Manhattan norm, cosine similarity, Pearson correlation and Jaccard coefficient. Finally, the technique of spectral clustering is used for community detection. It is based on the diffusion of information through a social network and the spectral analysis of the corresponding matrix of transition probabilities.
Date: 2021
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-662-62521-7_5
Ordering information: This item can be ordered from
http://www.springer.com/9783662625217
DOI: 10.1007/978-3-662-62521-7_5
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().