SPAGRM: effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits
He Xu,
Yuzhuo Ma,
Lin-lin Xu,
Yin Li,
Yufei Liu,
Ying Li,
Xu-jie Zhou,
Wei Zhou,
Seunggeun Lee,
Peipei Zhang (),
Weihua Yue () and
Wenjian Bi ()
Additional contact information
He Xu: Peking University
Yuzhuo Ma: Peking University
Lin-lin Xu: Peking University First Hospital; Peking University Institute of Nephrology
Yin Li: Peking University Health Science Center
Yufei Liu: Peking University
Ying Li: Peking University
Xu-jie Zhou: Peking University First Hospital; Peking University Institute of Nephrology
Wei Zhou: Massachusetts General Hospital
Seunggeun Lee: Seoul National University
Peipei Zhang: Peking University Health Science Center
Weihua Yue: National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital)
Wenjian Bi: Peking University
Nature Communications, 2025, vol. 16, issue 1, 1-19
Abstract:
Abstract Sample relatedness is a major confounder in genome-wide association studies (GWAS), potentially leading to inflated type I error rates if not appropriately controlled. A common strategy is to incorporate a random effect related to genetic relatedness matrix (GRM) into regression models. However, this approach is challenging for large-scale GWAS of complex traits, such as longitudinal traits. Here we propose a scalable and accurate analysis framework, SPAGRM, which controls for sample relatedness via a precise approximation of the joint distribution of genotypes. SPAGRM can utilize GRM-free models and thus is applicable to various trait types and statistical methods, including linear mixed models and generalized estimation equations for longitudinal traits. A hybrid strategy incorporating saddlepoint approximation greatly increases the accuracy to analyze low-frequency and rare genetic variants, especially in unbalanced phenotypic distributions. We also introduce SPAGRM(CCT) to aggregate the results following different models via Cauchy combination test. Extensive simulations and real data analyses demonstrated that SPAGRM maintains well-controlled type I error rates and SPAGRM(CCT) can serve as a broadly effective method. Applying SPAGRM to 79 longitudinal traits extracted from UK Biobank primary care data, we identified 7,463 genetic loci, making a pioneering attempt to conduct GWAS for these traits as longitudinal traits.
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-56669-1 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56669-1
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-56669-1
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().