Quantifying the clusterness and trajectoriness of single-cell RNA-seq data
Hong Seo Lim and
Peng Qiu
PLOS Computational Biology, 2024, vol. 20, issue 2, 1-19
Abstract:
Among existing computational algorithms for single-cell RNA-seq analysis, clustering and trajectory inference are two major types of analysis that are routinely applied. For a given dataset, clustering and trajectory inference can generate vastly different visualizations that lead to very different interpretations of the data. To address this issue, we propose multiple scores to quantify the “clusterness” and “trajectoriness” of single-cell RNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. The scores we introduce are based on pairwise distance distribution, persistent homology, vector magnitude, Ripley’s K, and degrees of connectivity. Using simulated datasets, we demonstrate that the proposed scores are able to effectively differentiate between cluster-like data and trajectory-like data. Using real single-cell RNA-seq datasets, we demonstrate the scores can serve as indicators of whether clustering analysis or trajectory inference is a more appropriate choice for biological interpretation of the data.Author summary: Single-cell sequencing technologies have motivated development of numerous computational algorithms. Two main types of these algorithms are clustering and trajectory inference. When scientists have a scRNA-seq dataset, they usually pick one of these approaches based on what they think the data shows. If they think the data has distinct clusters of cells, they will analyze the data using clustering algorithms. If they think the data shows a continuous progression, they will use trajectory inference algorithms. However, sometimes using clustering and trajectory inference on the same data can lead to very different interpretations, where clustering algorithms produce distinct cell clusters while trajectory inference on the same data show continuous trajectories. This makes us wonder: which way of looking at the data is more appropriate? In this paper, we developed a pipeline for quantifying the “clusterness” and “trajectoriness” of scRNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. We think such geometric quantification is an important question that should be broadly discussed in the single-cell research community.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011866 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11866&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011866
DOI: 10.1371/journal.pcbi.1011866
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().