New tools for evaluating the results of cluster analyses
Hildegard Schaeper ()
Additional contact information
Hildegard Schaeper: HIS
German Stata Users' Group Meetings 2006 from Stata Users Group
Abstract:
Clustering methods are designed for finding groups in data, i.e., for grouping similar objects (variables or observations) into the same cluster and dissimilar objects into separate clusters. Although the main idea is rather simple, carrying out a cluster analysis remains a challenging task. The number of different clustering methods is huge and clustering includes many choices, such as the decision between basic approaches (e.g., hierarchical and partitioning methods), the choice of a dissimilarity or similarity measure, the selection of a particular linkage method when performing a hierarchical agglomerative cluster analysis, the choice of an initial partition when carrying out a partitioning cluster analysis, and the determination of the appropriate number of clusters. Each of these decisions can affect the classification results. Apart from two commands for determining the number of clusters (cluster stop, cluster dendrogram) Stata has no built-in tools that allow examination of clustering results. We therefore developed some simple tools that provide further evaluation criteria: * programs assisting in determining the number of clusters (Mojena’s stopping rules for hierarchical clustering techniques, PRE coefficient, F-Max statistic and Beale’s F values for a partitioning cluster analysis), * a program for testing the stability of classifications produced by different cluster analyses (Rand index), and * a program that computes ETA2 to assess how well the clustering variables separate the clusters. The presentation will compare these programs with other cluster-analysis tools (agglomeration schedule, scree diagram).
Date: 2006-05-24
References: Add references at CitEc
Citations:
Downloads: (external link)
http://fmwww.bc.edu/repec/dsug2006/schaeper_pres_short.ppt
Our link check indicates that this URL is bad, the error code is: 404 Not Found
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:dsug06:08
Access Statistics for this paper
More papers in German Stata Users' Group Meetings 2006 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().