FedGMMAT: Federated generalized linear mixed model association tests
Wentao Li,
Han Chen,
Xiaoqian Jiang and
Arif Harmanci
PLOS Computational Biology, 2024, vol. 20, issue 7, 1-28
Abstract:
Increasing genetic and phenotypic data size is critical for understanding the genetic determinants of diseases. Evidently, establishing practical means for collaboration and data sharing among institutions is a fundamental methodological barrier for performing high-powered studies. As the sample sizes become more heterogeneous, complex statistical approaches, such as generalized linear mixed effects models, must be used to correct for the confounders that may bias results. On another front, due to the privacy concerns around Protected Health Information (PHI), genetic information is restrictively protected by sharing according to regulations such as Health Insurance Portability and Accountability Act (HIPAA). This limits data sharing among institutions and hampers efforts around executing high-powered collaborative studies. Federated approaches are promising to alleviate the issues around privacy and performance, since sensitive data never leaves the local sites. Motivated by these, we developed FedGMMAT, a federated genetic association testing tool that utilizes a federated statistical testing approach for efficient association tests that can correct for confounding fixed and additive polygenic random effects among different collaborating sites. Genetic data is never shared among collaborating sites, and the intermediate statistics are protected by encryption. Using simulated and real datasets, we demonstrate FedGMMAT can achieve the virtually same results as pooled analysis under a privacy-preserving framework with practical resource requirements.Author summary: Traditional GWAS approaches require the transfer of sensitive genetic data to a central location for analysis, raising privacy concerns and data security issues. We propose a federated learning technique named FedGMMAT to conduct large-scale genetic analyses without the need to share sensitive information. By allowing data repositories to perform local computations and only share the aggregated results in encryption, FedGMMAT ensures that individual genetic data remains secure and private. This novel algorithm enhances the capability of researchers to collaborate on GWAS across multiple sites, thereby increasing the statistical power and robustness of genetic discoveries while maintaining the confidentiality of participant data. This approach not only addresses privacy concerns but also paves the way for larger scale genetic studies by enabling the participation of diverse data sets from various institutions.
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012142 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12142&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012142
DOI: 10.1371/journal.pcbi.1012142
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().