EconPapers    
Economics at your fingertips  
 

THE EFFECT OF BINARY DATA TRANSFORMATION IN CATEGORICAL DATA CLUSTERING

Cibulková Jana (), Šulc Zdeněk (), Sirota Sergej () and Řezanková Hana ()
Additional contact information
Cibulková Jana: Department of Statistics and Probability, University of Economics, Prague, Czech Republic .
Šulc Zdeněk: Department of Statistics and Probability, University of Economics, Prague, Czech Republic .
Sirota Sergej: Department]of Statistics and Probability, University of Economics, Prague, Czech Republic .
Řezanková Hana: Department of Statistics and Probability, University of Economics, Prague, Czech Republic .

Statistics in Transition New Series, 2019, vol. 20, issue 2, 33-47

Abstract: This paper focuses on hierarchical clustering of categorical data and compares two approaches which can be used for this task. The first one, an extremely common approach, is to perform a binary transformation of the categorical variables into sets of dummy variables and then use the similarity measures suited for binary data. These similarity measures are well examined, and they occur in both commercial and non-commercial software. However, a binary transformation can possibly cause a loss of information in the data or decrease the speed of the computations. The second approach uses similarity measures developed for the categorical data. But these measures are not so well examined as the binary ones and they are not implemented in commercial software. The comparison of these two approaches is performed on generated data sets with categorical variables and the evaluation is done using both the internal and the external evaluation criteria. The purpose of this paper is to show that the binary transformation is not necessary in the process of clustering categorical data since the second approach leads to at least comparably good clustering results as the first approach.

Keywords: hierarchical cluster analysis; nominal variable; binary variable; categorical data; similarity measures; evaluation criteria; generated data (search for similar items in EconPapers)
Date: 2019
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.21307/stattrans-2019-013 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:vrs:stintr:v:20:y:2019:i:2:p:33-47:n:1

DOI: 10.21307/stattrans-2019-013

Access Statistics for this article

Statistics in Transition New Series is currently edited by Włodzimierz Okrasa

More articles in Statistics in Transition New Series from Statistics Poland
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-20
Handle: RePEc:vrs:stintr:v:20:y:2019:i:2:p:33-47:n:1