Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method
Shi Yang (),
Shi Weiping,
Wang Mengqiao,
Lee Ji-Hyun,
Kang Huining () and
Jiang Hui ()
Additional contact information
Shi Yang: Division of Biostatistics and Data Science, Department of Population Health Sciences and Department of Neuroscience and Regenerative Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
Shi Weiping: College of Mathematics, Jilin University, Changchun, 130012, China
Wang Mengqiao: Department of Epidemiology and Biostatistics, School of Public Health, Chengdu Medical College, Chengdu, 610500, China
Lee Ji-Hyun: Division of Quantitative Sciences, University of Florida Health Cancer Center and Department of Biostatistics, University of Florida, Gainesville, FL 32610, USA
Kang Huining: University of New Mexico Comprehensive Cancer Center Biostatistics Shared Resource, University of New Mexico, Albuquerque, NM 87131, USA
Jiang Hui: Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
Statistical Applications in Genetics and Molecular Biology, 2023, vol. 22, issue 1, 22
Abstract:
Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge in the application of permutation tests in genomic studies is that an enormous number of permutations are often needed to obtain reliable estimates of very small p-values, leading to intensive computational effort. To address this issue, we develop algorithms for the accurate and efficient estimation of small p-values in permutation tests for paired and independent two-group genomic data, and our approaches leverage a novel framework for parameterizing the permutation sample spaces of those two types of data respectively using the Bernoulli and conditional Bernoulli distributions, combined with the cross-entropy method. The performance of our proposed algorithms is demonstrated through the application to two simulated datasets and two real-world gene expression datasets generated by microarray and RNA-Seq technologies and comparisons to existing methods such as crude permutations and SAMC, and the results show that our approaches can achieve orders of magnitude of computational efficiency gains in estimating small p-values. Our approaches offer promising solutions for the improvement of computational efficiencies of existing permutation test procedures and the development of new testing methods using permutations in genomic data analysis.
Keywords: genomic data analysis; importance sampling; Monte Carlo simulation; p-value; permutation test; the cross-entropy method (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/sagmb-2021-0067 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:22:y:2023:i:1:p:22:n:1
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.1515/sagmb-2021-0067
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().