Exploring the sampling universe of RNA-seq
Tauber Stefanie () and
Arndt von Haeseler ()
Additional contact information
Tauber Stefanie: Center for Integrative Bioinformatics, Max F Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria
Arndt von Haeseler: Center for Integrative Bioinformatics, Max F Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria
Statistical Applications in Genetics and Molecular Biology, 2013, vol. 12, issue 2, 175-188
Abstract:
How deep is deep enough? While RNA-sequencing represents a well-established technology, the required sequencing depth for detecting all expressed genes is not known. If we leave the entire biological overhead and meta-information behind we are dealing with a classical sampling process. Such sampling processes are well known from population genetics and thoroughly investigated. Here we use the Pitman Sampling Formula to model the sampling process of RNA-sequencing. By doing so we characterize the sampling by means of two parameters which grasp the conglomerate of different sequencing technologies, protocols and their associated biases. We differ between two levels of sampling: number of reads per gene and respectively, number of reads starting at each position of a specific gene. The latter approach allows us to evaluate the theoretical expectation of uniform coverage and the performance of sequencing protocols in that respect. Most importantly, given a pilot sequencing experiment we provide an estimate for the size of the underlying sampling universe and, based on these findings, evaluate an estimator for the number of newly detected genes when sequencing an additional sample of arbitrary size.
Keywords: RNA sequencing; sampling; modeling RNA-seq; deep sequencing; Pitman sampling formula (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/sagmb-2012-0049 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:12:y:2013:i:2:p:175-188:n:1002
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.1515/sagmb-2012-0049
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().