Statistical Study Design for Analyzing Multiple Gene Loci Correlation in DNA Sequences
Pianpool Kamoljitprapa,
Fazil M. Baksh,
Andrea De Gaetano (),
Orathai Polsen and
Piyachat Leelasilapasart
Additional contact information
Pianpool Kamoljitprapa: Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
Fazil M. Baksh: Department of Mathematics and Statistics, University of Reading, Reading RG6 6AH, UK
Andrea De Gaetano: Consiglio Nazionale delle Ricerche, CNR-IASI Rome and CNR-IRIB Palermo, 90146 Palermo, Italy
Orathai Polsen: Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
Piyachat Leelasilapasart: Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
Mathematics, 2023, vol. 11, issue 23, 1-14
Abstract:
This study presents a novel statistical and computational approach using nonparametric regression, which capitalizes on correlation structure to deal with the high-dimensional data often found in pharmacogenomics, for instance, in Crohn’s inflammatory bowel disease. The empirical correlation between the test statistics, investigated via simulation, can be used as an estimate of noise. The theoretical distribution of −log 10 ( p -value) is used to support the estimation of that optimal bandwidth for the model, which adequately controls type I error rates while maintaining reasonable power. Two proposed approaches, involving normal and Laplace-LD kernels, were evaluated by conducting a case-control study using real data from a genome-wide association study on Crohn’s disease. The study successfully identified single nucleotide polymorphisms on the NOD2 gene associated with the disease. The proposed method reduces the computational burden by approximately 33% with reasonable power, allowing for a more efficient and accurate analysis of genetic variants influencing drug responses. The study contributes to the advancement of statistical methodology for analyzing complex genetic data and is of practical advantage for the development of personalized medicine.
Keywords: DNA sequence; correlation structure; high-dimensional data; nonparametric regression; case-control study (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/23/4710/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/23/4710/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:23:p:4710-:d:1284382
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().