SMART: a subspace clustering algorithm that automatically identifies the appropriate number of clusters

Jing, Liping; Li, Junjie; Ng, Michael K.; Cheung, Yiu-ming; Huang, Joshua

SMART: a subspace clustering algorithm that automatically identifies the appropriate number of clusters

Liping Jing, Junjie Li, Michael K. Ng, Yiu-ming Cheung and Joshua Huang

International Journal of Data Mining, Modelling and Management, 2009, vol. 1, issue 2, 149-177

Abstract: This paper presents a subspace k-means clustering algorithm for high-dimensional data with automatic selection of k. A new penalty term is introduced to the objective function of the fuzzy k-means clustering process to enable several clusters to compete for objects, which leads to merging some cluster centres and the identification of the 'true' number of clusters. The algorithm determines the number of clusters in a dataset by adjusting the penalty term factor. A subspace cluster validation index is proposed and employed to verify the subspace clustering results generated by the algorithm. The experimental results from both the synthetic and real data have demonstrated that the algorithm is effective in producing consistent clustering results and the correct number of clusters. Some real datasets are used to demonstrate how the proposed algorithm can determine interesting sub-clusters in the datasets.

Keywords: data mining; subspace clustering; fuzzy k-means; cluster numbers; weighting; high-dimensional data. (search for similar items in EconPapers)
Date: 2009
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=26074 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:1:y:2009:i:2:p:149-177

Access Statistics for this article

More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().