EconPapers    
Economics at your fingertips  
 

A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

Nan Lin, Yun Zhu, Ruzong Fan and Momiao Xiong

PLOS Computational Biology, 2017, vol. 13, issue 10, 1-33

Abstract: Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.Author summary: Association analysis of multiple phenotypes will unravel the genetic pleiotropic structures of multiple phenotypes, provide a powerful tool for developing drug with fewer side effects. To increase the power of the tests for high dimensional association analysis of multiple phenotypes with next-generation sequencing data, a key issue is to develop novel statistics that can effectively extract informative internal representation and features from high dimensional data. However, the current paradigm of association analysis of multiple phenotypes does not efficiently utilize the rich correlation structure of the genotype and phenotype data. To shift the paradigm of association analysis from shallow multivariate analysis to comprehensive functional analysis, we proposed a new general statistical framework referred to as a quadratically regularized functional canonical correlation analysis (QRFCCA) for association test which explores rich correlation information in the genotype and phenotype data. Large-scale simulations demonstrate that the QRFCCA has a much higher power than that of the many existing statistics while retaining the appropriate type 1 errors. To further evaluate the new approach, the QRFCCA are also applied to the TwinsUK study with 46 traits and sequencing data. The results show that the QRFCCA substantially outperforms the other statistics.

Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005788 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05788&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005788

DOI: 10.1371/journal.pcbi.1005788

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1005788