EconPapers    
Economics at your fingertips  
 

No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data

Lun Aaron T. L. and Smyth Gordon K. ()
Additional contact information
Lun Aaron T. L.: The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Parade, Parkville, VIC 3052, Australia
Smyth Gordon K.: The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Parade, Parkville, VIC 3052, Australia

Statistical Applications in Genetics and Molecular Biology, 2017, vol. 16, issue 2, 83-93

Abstract: RNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in underestimation of the genewise variances and loss of type I error control. This article proposes a formula for the reduced residual d.f. that restores error control in simulated RNA-seq data and improves detection of DE genes in a real data analysis. The new approach is implemented in the quasi-likelihood framework of the edgeR software package. The results of this article also apply to RNA-seq analyses that apply linear models to log-transformed counts, such as those in the limma software package, and more generally to any count-based GLM where exactly zero fitted values are possible.

Keywords: differential expression; generalized linear models; quasi-likelihood; RNA sequencing (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2017-0010 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:16:y:2017:i:2:p:83-93:n:1005

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.1515/sagmb-2017-0010

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:16:y:2017:i:2:p:83-93:n:1005