STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring

Javaid, Azka; Frost, Hildreth Robert

STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring

Azka Javaid and Hildreth Robert Frost

PLOS Computational Biology, 2023, vol. 19, issue 8, 1-24

Abstract: The accurate estimation of cell surface receptor abundance for single cell transcriptomics data is important for the tasks of cell type and phenotype categorization and cell-cell interaction quantification. We previously developed an unsupervised receptor abundance estimation technique named SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) to address the challenges associated with accurate abundance estimation. In that paper, we concluded that SPECK results in improved concordance with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data relative to comparative unsupervised abundance estimation techniques using only single-cell RNA-sequencing (scRNA-seq) data. In this paper, we outline a new supervised receptor abundance estimation method called STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding) that leverages associations learned from joint scRNA-seq/CITE-seq training data and a thresholded gene set scoring mechanism to estimate receptor abundance for scRNA-seq target data. We evaluate STREAK relative to both unsupervised and supervised receptor abundance estimation techniques using two evaluation approaches on six joint scRNA-seq/CITE-seq datasets that represent four human and mouse tissue types. We conclude that STREAK outperforms other abundance estimation strategies and provides a more biologically interpretable and transparent statistical model.Author summary: Herein, we present an overview of our recently developed supervised receptor abundance estimation technique, STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding), which leverages co-expression associations learned from joint scRNA-seq/CITE-seq data to perform approximate abundance estimation. More specifically, STREAK functions by utilizing these expression associations to develop weighted membership gene sets, which are next thresholded following a gene set scoring procedure. These thresholded scores are set to the estimated abundance profiles.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011413 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11413&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011413

DOI: 10.1371/journal.pcbi.1011413

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().