Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
Zheng Xu (),
Song Yan,
Shuai Yuan,
Cong Wu,
Sixia Chen,
Zifang Guo and
Yun Li ()
Additional contact information
Zheng Xu: Department of Mathematics and Statistics, Wright State University, Dayton, OH 45324, USA
Song Yan: Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Shuai Yuan: Glaxosmithkline, plc, Collegeville, PA 19426, USA
Cong Wu: Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68508, USA
Sixia Chen: Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
Zifang Guo: Merck & Co., Inc., Rahway, NJ 07065, USA
Yun Li: Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Stats, 2023, vol. 6, issue 1, 1-14
Abstract:
Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths d ≥ 2 .
Keywords: association study; next-generation sequencing; genotype; genotype likelihood function; testing (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2571-905X/6/1/29/pdf (application/pdf)
https://www.mdpi.com/2571-905X/6/1/29/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:6:y:2023:i:1:p:29-481:d:1101384
Access Statistics for this article
Stats is currently edited by Mrs. Minnie Li
More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().