From noise to models to numbers: Evaluating negative binomial models and parameter estimations in single-cell RNA-seq
Yiling Wang,
Zhanpeng Shu,
Zhixing Cao and
Ramon Grima
PLOS Computational Biology, 2026, vol. 22, issue 3, 1-37
Abstract:
The Negative Binomial (NB) distribution is widely used to approximate transcript count distributions in single-cell RNA sequencing (scRNA-seq) data, yet the reason for its ubiquity is not fully understood. Here, we employ a computationally efficient model selection technique to map the relationship between the best-fit models – Beta-Poisson (Telegraph), NB, and Poisson – and the kinetic parameters that govern gene expression stochasticity. Our findings reveal that the NB distribution closely approximates simulated data (incorporating both biological and technical noise) within an intermediate range of the sum of the gene activation and inactivation rates normalized by the mRNA degradation rate. This range expands with decreasing mean expression, increasing technical noise, and larger sample sizes. The results imply that: (i) good NB fits occur in diverse parameter regimes without exclusively indicating transcriptional bursting; (ii) for small sample sizes, biological noise predominantly shapes the NB profile even when technical noise is present; (iii) under steady-state conditions, gene-specific parameters (burst size and frequency) estimated in regions where the NB model fits well, typically show large relative errors, even after corrections for technical noise, and (iv) gene ranking by burst frequency remains reliably accurate, suggesting that burst parameters are most informative in a relative sense. Finally, applying technical-noise–corrected model fitting to scRNA-seq data confirms that a substantial fraction of mammalian genes fall within these NB-fitting regimes, despite lacking transcriptional bursting.Author summary: Single-cell RNA sequencing (scRNA-seq) measures mRNA molecule counts in individual cells. For most genes, these counts are well fit by a negative binomial (NB) distribution, and NB fits are often interpreted as evidence for transcriptional bursting. We asked when an NB model is expected to arise from a mechanistic gene-expression process, and what biological meaning can be safely assigned to its parameters. We combine the standard two-state telegraph model of promoter switching with a binomial model of transcript capture, and introduce the approximate expected Bayesian information criterion (aeBIC). aeBIC predicts which distribution—telegraph, NB, or Poisson—would be chosen by likelihood/BIC model selection. We show that NB fits are optimal in an intermediate regime of promoter switching relative to mRNA decay, and that this regime expands for low mean expression, larger sample sizes, and increased cell-to-cell variability in capture probability. Consequently, excellent NB fits can occur well outside the classical bursting limit. In these regimes, estimating burst size and burst frequency from NB parameters can incur large absolute errors, although relative comparisons are more robust: ranking genes by inferred burst frequency is usually preserved. Our results provide practical guidance for model choice and for interpreting fitted burst parameters in single-cell genomics.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014014 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14014&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014014
DOI: 10.1371/journal.pcbi.1014014
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().