A Bayesian semiparametric factor analysis model for subtype identification
Warren Joshua L. and
Zhao Hongyu ()
Additional contact information
Zhao Hongyu: Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, USA
Statistical Applications in Genetics and Molecular Biology, 2017, vol. 16, issue 2, 145-158
Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods.
Keywords: Bayesian factor analysis; Bayesian nonparametrics; clustering; Dirichlet process; gene expression study (search for similar items in EconPapers)
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
For access to full text, subscription to the journal or payment for the individual article is required.
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:16:y:2017:i:2:p:145-158:n:3
Ordering information: This journal article can be ordered from
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().