Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model

Xiaohong, Li; Dongfeng, Wu; G.F., Cooper Nigel; N., Rai Shesh

Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model

Li Xiaohong (), Wu Dongfeng, Cooper Nigel G.F. and Rai Shesh N.
Additional contact information
Li Xiaohong: Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USA
Wu Dongfeng: Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USA
Cooper Nigel G.F.: Department of Anatomical Sciences and Neurobiology, School of Medicine, University of Louisville, Louisville, KY, USA
Rai Shesh N.: Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USA

Statistical Applications in Genetics and Molecular Biology, 2019, vol. 18, issue 1, 17

Abstract: High throughput RNA sequencing (RNA-seq) technology is increasingly used in disease-related biomarker studies. A negative binomial distribution has become the popular choice for modeling read counts of genes in RNA-seq data due to over-dispersed read counts. In this study, we propose two explicit sample size calculation methods for RNA-seq data using a negative binomial regression model. To derive these new sample size formulas, the common dispersion parameter and the size factor as an offset via a natural logarithm link function are incorporated. A two-sided Wald test statistic derived from the coefficient parameter is used for testing a single gene at a nominal significance level 0.05 and multiple genes at a false discovery rate 0.05. The variance for the Wald test is computed from the variance-covariance matrix with the parameters estimated from the maximum likelihood estimates under the unrestricted and constrained scenarios. The performance and a side-by-side comparison of our new formulas with three existing methods with a Wald test, a likelihood ratio test or an exact test are evaluated via simulation studies. Since other methods are much computationally extensive, we recommend our M1 method for quick and direct estimation of sample sizes in an experimental design. Finally, we illustrate sample sizes estimation using an existing breast cancer RNA-seq data.

Keywords: a Wald test; differentially expressed genes (DEGs); power analysis; RNA-seq; sample size (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2018-0021 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:18:y:2019:i:1:p:17:n:2

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.1515/sagmb-2018-0021

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().