EconPapers    
Economics at your fingertips  
 

D3Impute: Dropout-aware discrimination, distribution-aware modeling, and density-guide imputation for scRNA-seq data

Siyi Huang, Linfeng Jiang, Ming Yi and Yuan Zhu

PLOS Computational Biology, 2025, vol. 21, issue 12, 1-38

Abstract: Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental issue of distinguishing these artifacts from true biological zeros, where a gene is genuinely absent, remains a key hurdle for computational methods, as misclassification can distort biological signals during data recovery. To overcome this, we introduce D3Impute, a discriminative imputation framework built on three key innovations: (1) a distribution-aware normalization step that adapts to dataset-specific characteristics while preserving meaningful biological variation; (2) a dual-network discriminator that uses bulk RNA-seq data as a biological reference to accurately identify non-biological zeros while retaining the true biological zeros; and (3) a density-guided imputation engine that recovers expression values while maintaining local cellular neighborhood structures. Through comprehensive benchmarking against 12 state-of-the-art methods across six diverse datasets, D3Impute demonstrates consistent and significant improvements in essential downstream analyses, including cell clustering, trajectory inference, and differential expression detection. Furthermore, we provide an extensive practical evaluation of D3Impute, demonstrating its robustness across varying data qualities and providing clear guidelines for optimal application. By offering a robust, biologically informed, and user-oriented solution, D3Impute not only enhances scRNA-seq data analysis but also offers a generalizable framework for handling zero-inflated data in computational biology.Author summary: Single-cell RNA sequencing (scRNA-seq) reveals cellular heterogeneity but is compromised by technical “dropout" events—non-biological zeros that obscure true expression patterns. To address this, we developed D3Impute, a computational framework built on three core innovations: (1) Distribution-aware modeling adapts normalization to each cell’s statistical properties, moving beyond one-size-fits-all approaches; (2) Dropout-aware discrimination integrates cell–cell networks from scRNA-seq data with gene co-expression networks from bulk RNA-seq to accurately identify non-biological zeros; (3) Density-guided imputation employs a neighborhood-preserving algorithm with dynamic weighting to recover missing values while preventing over-smoothing and retaining meaningful cellular heterogeneity. Together, these components form a principled and interpretable framework that significantly enhances the accuracy of scRNA-seq data analysis.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013744 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13744&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013744

DOI: 10.1371/journal.pcbi.1013744

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-12-07
Handle: RePEc:plo:pcbi00:1013744