Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration
Islam Shofiqul,
Anand Sonia,
Hamid Jemila,
Thabane Lehana and
Beyene Joseph ()
Additional contact information
Islam Shofiqul: Population Health Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada
Anand Sonia: Population Health Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada
Hamid Jemila: Department of Medicine, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
Thabane Lehana: Population Health Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada
Beyene Joseph: Department of Medicine, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
Statistical Applications in Genetics and Molecular Biology, 2017, vol. 16, issue 3, 199-216
Abstract:
Linear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.
Keywords: AUC; Copula; Gamma distribution; Kernel PCA; principal component (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/sagmb-2016-0066 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:16:y:2017:i:3:p:199-216:n:3
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.1515/sagmb-2016-0066
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().