Geometric consistency of principal component scores for high‐dimensional mixture models and its application
Kazuyoshi Yata and
Makoto Aoshima
Scandinavian Journal of Statistics, 2020, vol. 47, issue 3, 899-921
Abstract:
In this article, we consider clustering based on principal component analysis (PCA) for high‐dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high‐dimensional data. First, we derive a geometric representation of high‐dimension, low‐sample‐size (HDLSS) data taken from a two‐class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets.
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://doi.org/10.1111/sjos.12432
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:scjsta:v:47:y:2020:i:3:p:899-921
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0303-6898
Access Statistics for this article
Scandinavian Journal of Statistics is currently edited by ÿrnulf Borgan and Bo Lindqvist
More articles in Scandinavian Journal of Statistics from Danish Society for Theoretical Statistics, Finnish Statistical Society, Norwegian Statistical Association, Swedish Statistical Association
Bibliographic data for series maintained by Wiley Content Delivery ().