SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data

Schiffman, Courtney; Lin, Christina; Shi, Funan; Chen, Luonan; Sohn, Lydia; Huang, Haiyan

SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data

Courtney Schiffman, Christina Lin, Funan Shi, Luonan Chen, Lydia Sohn and Haiyan Huang ()
Additional contact information
Courtney Schiffman: UC Berkeley
Christina Lin: UC Berkeley
Funan Shi: UC Berkeley
Luonan Chen: Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences
Lydia Sohn: UC Berkeley
Haiyan Huang: UC Berkeley

Statistics in Biosciences, 2017, vol. 9, issue 1, No 11, 200-216

Abstract: Abstract One goal of single-cell RNA sequencing (scRNA seq) is to expose possible heterogeneity within cell populations due to meaningful, biological variation. Examining cell-to-cell heterogeneity, and further, identifying subpopulations of cells based on scRNA seq data has been of common interest in life science research. A key component to successfully identifying cell subpopulations (or clustering cells) is the (dis)similarity measure used to group the cells. In this paper, we introduce a novel measure, named SIDEseq, to assess cell-to-cell similarity using scRNA seq data. SIDEseq first identifies a list of putative differentially expressed (DE) genes for each pair of cells. SIDEseq then integrates the information from all the DE gene lists (corresponding to all pairs of cells) to build a similarity measure between two cells. SIDEseq can be implemented in any clustering algorithm that requires a (dis)similarity matrix. This new measure incorporates information from all cells when evaluating the similarity between any two cells, a characteristic not commonly found in existing (dis)similarity measures. This property is advantageous for two reasons: (a) borrowing information from cells of different subpopulations allows for the investigation of pairwise cell relationships from a global perspective and (b) information from other cells of the same subpopulation could help to ensure a robust relationship assessment. We applied SIDEseq to a newly generated human ovarian cancer scRNA seq dataset, a public human embryo scRNA seq dataset, and several simulated datasets. The clustering results suggest that the SIDEseq measure is capable of uncovering important relationships between cells, and outperforms or at least does as well as several popular (dis)similarity measures when used on these datasets.

Keywords: single-cell RNA sequencing (scRNA seq); subpopulation identification; single-cell clustering; similarity measure; ovarian cancer; EMT inducers (Thrombin; TGFB-1) (search for similar items in EconPapers)
Date: 2017
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s12561-017-9194-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:stabio:v:9:y:2017:i:1:d:10.1007_s12561-017-9194-z

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/12561

DOI: 10.1007/s12561-017-9194-z

Access Statistics for this article

Statistics in Biosciences is currently edited by Hongyu Zhao and Xihong Lin

More articles in Statistics in Biosciences from Springer, International Chinese Statistical Association
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().