Economics at your fingertips  

An application of the minimal spanning tree approach to the cluster stability problem

Z. Volkovich (), Z. Barzily (), G.-W. Weber (), D. Toledano-Kitai () and R. Avros ()

Central European Journal of Operations Research, 2012, vol. 20, issue 1, 119-139

Abstract: Among the areas of data and text mining which are employed today in OR, science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. An important component of clustering theory is determination of the true number of clusters. This problem has not been satisfactorily solved. In our paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters, we estimate the stability of the partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured by the total number of edges, in the clusters’ minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis of well mingled samples, within the clusters, leads to an asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster, corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described distribution and estimates its left-asymmetry. Several presented numerical experiments demonstrate the ability of the approach to detect the true number of clusters. Copyright Springer-Verlag 2012

Keywords: Clustering; Cluster stability; Minimal spanning tree; Two sample test; Data mining (search for similar items in EconPapers)
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1) Track citations by RSS feed

Downloads: (external link) (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10100

DOI: 10.1007/s10100-010-0157-4

Access Statistics for this article

Central European Journal of Operations Research is currently edited by Ulrike Leopold-Wildburger

More articles in Central European Journal of Operations Research from Springer, Slovak Society for Operations Research, Hungarian Operational Research Society, Czech Society for Operations Research, Österr. Gesellschaft für Operations Research (ÖGOR), Slovenian Society Informatika - Section for Operational Research, Croatian Operational Research Society
Bibliographic data for series maintained by Sonal Shukla ().

Page updated 2020-04-23
Handle: RePEc:spr:cejnor:v:20:y:2012:i:1:p:119-139