EconPapers    
Economics at your fingertips  
 

Sparse Canonical Correlation Analysis with Application to Genomic Data Integration

Parkhomenko Elena, Tritchler David and Beyene Joseph
Additional contact information
Parkhomenko Elena: Hospital for Sick Children Research Institute
Tritchler David: University of Toronto, State University of New York at Buffalo, Ontario Cancer Institute
Beyene Joseph: Hospital for Sick Children Research Institute, University of Toronto

Statistical Applications in Genetics and Molecular Biology, 2009, vol. 8, issue 1, 36

Abstract: Large scale genomic studies with multiple phenotypic or genotypic measures may require the identification of complex multivariate relationships. In multivariate analysis a common way to inspect the relationship between two sets of variables based on their correlation is canonical correlation analysis, which determines linear combinations of all variables of each type with maximal correlation between the two linear combinations. However, in high dimensional data analysis, when the number of variables under consideration exceeds tens of thousands, linear combinations of the entire sets of features may lack biological plausibility and interpretability. In addition, insufficient sample size may lead to computational problems, inaccurate estimates of parameters and non-generalizable results. These problems may be solved by selecting sparse subsets of variables, i.e. obtaining sparse loadings in the linear combinations of variables of each type. In this paper we present Sparse Canonical Correlation Analysis (SCCA) which examines the relationships between two types of variables and provides sparse solutions that include only small subsets of variables of each type by maximizing the correlation between the subsets of variables of different types while performing variable selection. We also present an extension of SCCA - adaptive SCCA. We evaluate their properties using simulated data and illustrate practical use by applying both methods to the study of natural variation in human gene expression.

Keywords: canonical correlation; sparseness; data integration (search for similar items in EconPapers)
Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (16)

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1406 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:1

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.2202/1544-6115.1406

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:1