Latent class model with conditional dependency per modes to cluster categorical data
Matthieu Marbac (),
Christophe Biernacki and
Vincent Vandewalle
Additional contact information
Matthieu Marbac: McMaster University
Christophe Biernacki: University Lille 1 and CNRS and Inria Lille
Vincent Vandewalle: EA 2694 University Lille 2 and Inria Lille
Advances in Data Analysis and Classification, 2016, vol. 10, issue 2, No 5, 183-207
Abstract:
Abstract We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.
Keywords: Categorical data; Clustering; Integrated complete-data likelihood; MCMC algorithm; Mixture models; Model selection; 62H30; 62F15; 62-07; 62F07 (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s11634-016-0250-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:10:y:2016:i:2:d:10.1007_s11634-016-0250-1
Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2
DOI: 10.1007/s11634-016-0250-1
Access Statistics for this article
Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs
More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().