EconPapers    
Economics at your fingertips  
 

Evaluation of Selected Approaches to Clustering Categorical Variables

Zdeněk Šulc () and Hana Řezanková ()

Statistics in Transition new series, 2014, vol. 15, issue 4, 591-610

Abstract: This paper focuses on recently proposed similarity measures and their performance in categorical variable clustering. It compares clustering results using three recently developed similarity measures (IOF, OF and Lin measures) with results obtained using two association measures for nominal variables (Cramér’s V and the uncertainty coefficient) and with the simple matching coefficient (the overlap measure). To eliminate the influence of a particular linkage method on the structure of final clusters, three linkage methods are examined (complete, single, average). The created groups (clusters) of variables can be considered as the basis for dimensionality reduction, e.g. by choosing one of the variables from a given group as a representative for the whole group. The quality of resulting clusters is evaluated by the within-cluster variability, expressed by the WCM coefficient, and by dendrogram analysis. The examined similarity measures are compared and evaluated using two real data sets from a social survey.

Keywords: variable clustering; nominal variables; association measures; similarity measures (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://index.stat.gov.pl/repec/files/csb/stintr/csb_stintr_v15_2014_i4_n7.pdf Main text (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:csb:stintr:v:15:y:2014:i:4:p:591-610

Access Statistics for this article

Statistics in Transition new series is currently edited by Włodzimierz Okrasa

More articles in Statistics in Transition new series from Główny Urząd Statystyczny (Polska) Contact information at EDIRC.
Bibliographic data for series maintained by Beata Witek ( this e-mail address is bad, please contact ).

 
Page updated 2025-03-19
Handle: RePEc:csb:stintr:v:15:y:2014:i:4:p:591-610