Sequential dimension reduction and clustering of mixed-type data
Angelos Markos,
Odysseas Moschidis and
Theodore Chadjipantelis
International Journal of Data Analysis Techniques and Strategies, 2020, vol. 12, issue 3, 228-246
Abstract:
Clustering of a set of objects described by a mixture of continuous and categorical variables can be a challenging task. In the context of data reduction, an effective class of methods combine dimension reduction with clustering in the reduced space. In this paper, we review three approaches for sequential dimension reduction and clustering of mixed-type data. The first step of each approach involves the application of principal component analysis on a suitably transformed matrix. In the second step, a partitioning or hierarchical clustering algorithm is applied to the object scores in the reduced space. The common theoretical underpinnings of the three approaches are highlighted. The results of a benchmarking study show that sequential dimension reduction and clustering is an effective strategy, especially when categorical variables are more informative than continuous with regard to the underlying cluster structure. Strengths and limitations are also demonstrated on a real mixed-type dataset.
Keywords: cluster analysis; dimension reduction; correspondence analysis; principal component analysis; PCA; mixed-type data. (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.inderscience.com/link.php?id=108043 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:12:y:2020:i:3:p:228-246
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().