EconPapers    
Economics at your fingertips  
 

RgCop-A regularized copula based method for gene selection in single cell rna-seq data

Snehalika Lall, Sumanta Ray and Sanghamitra Bandyopadhyay

PLOS Computational Biology, 2021, vol. 17, issue 10, 1-19

Abstract: Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We raise an objective function by adding a l1 regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop can able to annotate the unknown cells with high accuracy.Author Summary: The existing approaches for gene selection which are based on high variation (highly variable genes) or significant high expression (highly expressed genes), failed to provide a stable and predictive feature/gene set. Since single cell data is susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Here, we propose a novel regularized copula based method for gene selection that leverage copula correlation (Ccor) measure for capturing cell-to-cell variability within the data. The proposed objective function uses a l1 regularization term to penalizes the redundant co-efficient of features/genes. We got significant improvement in the clustering/classification performance of cells over the other state-of-the-art. Due to the scale-invariant property of copula RgCop is impervious to technical noise, an acute issue associated with scRNA-seq data analysis. Moreover, the selected features/genes can be able to determine the unknown cells with high accuracy. Finally, RgCop can be applicable for identifying rare cell clusters or minor subpopulations within the single cell data.

Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009464 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 09464&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1009464

DOI: 10.1371/journal.pcbi.1009464

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1009464