EconPapers    
Economics at your fingertips  
 

Resistant multiple sparse canonical correlation

Coleman Jacob, Replogle Joseph, Chandler Gabriel and Hardin Johanna ()
Additional contact information
Coleman Jacob: Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA
Replogle Joseph: Medical Scientist Training Program, University of California - San Francisco, San Francisco, CA 94143, USA
Hardin Johanna: Department of Mathematics, Pomona College, Claremont, CA 91711, USA

Statistical Applications in Genetics and Molecular Biology, 2016, vol. 15, issue 2, 123-138

Abstract: Canonical correlation analysis (CCA) is a multivariate technique that takes two datasets and forms the most highly correlated possible pairs of linear combinations between them. Each subsequent pair of linear combinations is orthogonal to the preceding pair, meaning that new information is gleaned from each pair. By looking at the magnitude of coefficient values, we can find out which variables can be grouped together, thus better understanding multiple interactions that are otherwise difficult to compute or grasp intuitively. CCA appears to have quite powerful applications to high-throughput data, as we can use it to discover, for example, relationships between gene expression and gene copy number variation. One of the biggest problems of CCA is that the number of variables (often upwards of 10,000) makes biological interpretation of linear combinations nearly impossible. To limit variable output, we have employed a method known as sparse canonical correlation analysis (SCCA), while adding estimation which is resistant to extreme observations or other types of deviant data. In this paper, we have demonstrated the success of resistant estimation in variable selection using SCCA. Additionally, we have used SCCA to find multiple canonical pairs for extended knowledge about the datasets at hand. Again, using resistant estimators provided more accurate estimates than standard estimators in the multiple canonical correlation setting. R code is available and documented at https://github.com/hardin47/rmscca.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed

Downloads: (external link)
https://doi.org/10.1515/sagmb-2014-0081 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:15:y:2016:i:2:p:123-138:n:1

Ordering information: This journal article can be ordered from
https://www.degruyter.com/view/j/sagmb

DOI: 10.1515/sagmb-2014-0081

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2021-05-07
Handle: RePEc:bpj:sagmbi:v:15:y:2016:i:2:p:123-138:n:1