Clustering of categorical variables around latent variables
Jérome Saracco,
Marie Chavent and
Vanessa Kuentz
Cahiers du GREThA (2007-2019) from Groupe de Recherche en Economie Théorique et Appliquée (GREThA)
Abstract:
In the framework of clustering, the usual aim is to cluster observations and not variables. However the issue of variable clustering clearly appears for dimension reduction, selection of variables or in some case studies (sensory analysis, biochemistry, marketing, etc.). Clustering of variables is then studied as a way to arrange variables into homogeneous clusters, thereby organizing data into meaningful structures. Once the variables are clustered into groups such that variables are similar to the other variables belonging to their cluster, the selection of a subset of variables is possible. Several specific methods have been developed for the clustering of numerical variables. However concerning categorical variables, much less methods have been proposed. In this paper we extend the criterion used by Vigneau and Qannari (2003) in their Clustering around Latent Variables approach for numerical variables to the case of categorical data. The homogeneity criterion of a cluster of categorical variables is defined as the sum of the correlation ratio between the categorical variables and a latent variable, which is in this case a numerical variable. We show that the latent variable maximizing the homogeneity of a cluster can be obtained with Multiple Correspondence Analysis. Different algorithms for the clustering of categorical variables are proposed: iterative relocation algorithm, ascendant and divisive hierarchical clustering. The proposed methodology is illustrated by a real data application to satisfaction of pleasure craft operators.
Keywords: clustering of categorical variables; correlation ratio; iterative relocation algorithm; hierarchical clustering (search for similar items in EconPapers)
JEL-codes: C49 C69 (search for similar items in EconPapers)
Date: 2010
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://cahiersdugretha.u-bordeaux.fr/2010/2010-02.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:grt:wpegrt:2010-02
Access Statistics for this paper
More papers in Cahiers du GREThA (2007-2019) from Groupe de Recherche en Economie Théorique et Appliquée (GREThA) Contact information at EDIRC.
Bibliographic data for series maintained by Ernest Miguelez ().