EconPapers    
Economics at your fingertips  
 

Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach

Jangsun Baek and Jeong-Soo Park

The American Statistician, 2023, vol. 77, issue 3, 259-273

Abstract: One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.

Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2022.2141856 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:77:y:2023:i:3:p:259-273

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20

DOI: 10.1080/00031305.2022.2141856

Access Statistics for this article

The American Statistician is currently edited by Eric Sampson

More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:259-273