Convex clustering for binary data
Hosik Choi and
Seokho Lee ()
Additional contact information
Hosik Choi: Kyonggi University
Seokho Lee: Hankuk University of Foreign Studies
Advances in Data Analysis and Classification, 2019, vol. 13, issue 4, No 8, 1018 pages
Abstract:
Abstract We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical $$\ell _1$$ℓ1 pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an $$\ell _2$$ℓ2 fused penalty on the fusion parameters and an $$\ell _1$$ℓ1 penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.
Keywords: Binary data; Convex clustering; Dimension reduction; Fused penalty; 62H30 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11634-018-0350-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:13:y:2019:i:4:d:10.1007_s11634-018-0350-1
Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2
DOI: 10.1007/s11634-018-0350-1
Access Statistics for this article
Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs
More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().