EconPapers    
Economics at your fingertips  
 

Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework

Zheng Xu (), Song Yan, Cong Wu, Qing Duan, Sixia Chen and Yun Li ()
Additional contact information
Zheng Xu: Department of Mathematics and Statistics, Wright State University, Dayton, OH 45324, USA
Song Yan: Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Cong Wu: Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68508, USA
Qing Duan: Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Sixia Chen: Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
Yun Li: Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA

Mathematics, 2023, vol. 11, issue 11, 1-28

Abstract: To study the relationship between genetic variants and phenotypes, association testing is adopted; however, most association studies are conducted by genotype-based testing. Testing methods based on next-generation sequencing (NGS) data without genotype calling demonstrate an advantage over testing methods based on genotypes in the scenarios when genotype estimation is not accurate. Our objective was to develop NGS data-based methods for association studies to fill the gap in the literature. Single-variant testing methods based on NGS data have been proposed, including our previously proposed single-variant NGS data-based testing method, i.e., UNC combo method. The NGS data-based group testing method has been proposed by us using a linear model framework which can handle continuous responses. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is a common problem in association studies. To evaluate the performance of various estimators and compare them we performed simulation studies. We found that all methods have Type I errors controlled, and our NGS data-based methods have better performance than genotype-based methods for other types of responses, including binary responses (logistics regression) and count responses (Poisson regression), especially when sequencing depth is low. We have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based methods for a group of genetic variables. Compared with our previously proposed LM-based methods, the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.

Keywords: next-generation sequencing; association testing; generalized linear model; joint significance test; variable collapse test; genotype calling; score test; group testing; rare variant (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/11/2560/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/11/2560/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:11:p:2560-:d:1163047

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:11:y:2023:i:11:p:2560-:d:1163047