EconPapers    
Economics at your fingertips  
 

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Jing Qi, Yang Zhou, Zicen Zhao and Shuilin Jin

PLOS Computational Biology, 2021, vol. 17, issue 6, 1-20

Abstract: The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.Author summary: Single-cell RNA sequencing (scRNA-seq) allows researchers to analyze gene expression of thousands of single cells simultaneously. However, the low amount of extracted mRNA leads to a large number of dropout events, which introduce computational challenges and hinder downstream analysis of data. To address this problem, we developed SDImpute, a novel statistical method to recover the scRNA-seq data based on cell-level and gene-level information in this manuscript. The goal of our algorithm is to denoise the scRNA-seq data while maintaining the biological nature of gene expression. Combining SDImpute with the downstream analysis tools, we considered the matched bulk expression data and known cell labels of the scRNA-seq data as criteria to design experiments to validate the performance of our method in both simulated and real datasets. Moreover, we offer an R package with detailed instructions and an example input dataset. We hope that SDImpute will be beneficial to researchers to identify mechanisms underlying some biological processes by analysis of the scRNA-seq data.

Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009118 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 09118&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1009118

DOI: 10.1371/journal.pcbi.1009118

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1009118