EconPapers    
Economics at your fingertips  
 

Binomial models uncover biological variation during feature selection of droplet-based single-cell RNA sequencing

Breanne Sparta, Timothy Hamilton, Gunalan Natesan, Samuel D Aragones and Eric J Deeds

PLOS Computational Biology, 2024, vol. 20, issue 9, 1-31

Abstract: Effective analysis of single-cell RNA sequencing (scRNA-seq) data requires a rigorous distinction between technical noise and biological variation. In this work, we propose a simple feature selection model, termed “Differentially Distributed Genes” or DDGs, where a binomial sampling process for each mRNA species produces a null model of technical variation. Using scRNA-seq data where cell identities have been established a priori, we find that the DDG model of biological variation outperforms existing methods. We demonstrate that DDGs distinguish a validated set of real biologically varying genes, minimize neighborhood distortion, and enable accurate partitioning of cells into their established cell-type groups.Author summary: Single-cell omics technologies measure tens of thousands of genes in up to millions of individual cells. Yet, the sheer dimensionality of the data poses a challenge to its intelligibility. A typical first step in reducing the dimensionality is to apply a feature selection model that distinguishes real biological signals from technical noise. Yet without an appropriate model of technical noise, feature selection can introduce bias into the downstream analysis of the data. In this work, we demonstrate that, in the analysis of single-cell RNA sequencing data, the standard approach of finding Highly Variable Genes (HVGs) induces severe distortion and bias into the analysis of data, when compared to true biological variation that is known a priori. To address this issue, we present a new feature selection model and demonstrate that our model outperforms existing methods in its ability to accurately identify real biological variation.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012386 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12386&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012386

DOI: 10.1371/journal.pcbi.1012386

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pcbi00:1012386