Economics at your fingertips  

A Bayesian semiparametric factor analysis model for subtype identification

Sun Jiehuan, Warren Joshua L. and Zhao Hongyu ()
Additional contact information
Zhao Hongyu: Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, USA

Statistical Applications in Genetics and Molecular Biology, 2017, vol. 16, issue 2, 145-158

Abstract: Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods.

Keywords: Bayesian factor analysis; Bayesian nonparametrics; clustering; Dirichlet process; gene expression study (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed

Downloads: (external link) (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This journal article can be ordered from

DOI: 10.1515/sagmb-2016-0051

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

Page updated 2021-05-07
Handle: RePEc:bpj:sagmbi:v:16:y:2017:i:2:p:145-158:n:3