Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders
Yuge Wang and
Hongyu Zhao
PLOS Computational Biology, 2022, vol. 18, issue 4, 1-31
Abstract:
Advances in single-cell RNA sequencing (scRNA-seq) have led to successes in discovering novel cell types and understanding cellular heterogeneity among complex cell populations through cluster analysis. However, cluster analysis is not able to reveal continuous spectrum of states and underlying gene expression programs (GEPs) shared across cell types. We introduce scAAnet, an autoencoder for single-cell non-linear archetypal analysis, to identify GEPs and infer the relative activity of each GEP across cells. We use a count distribution-based loss term to account for the sparsity and overdispersion of the raw count data and add an archetypal constraint to the loss function of scAAnet. We first show that scAAnet outperforms existing methods for archetypal analysis across different metrics through simulations. We then demonstrate the ability of scAAnet to extract biologically meaningful GEPs using publicly available scRNA-seq datasets including a pancreatic islet dataset, a lung idiopathic pulmonary fibrosis dataset and a prefrontal cortex dataset.Author summary: Single-cell RNA sequencing (scRNA-seq) techniques enable the profiling of gene expression at the single-cell level, and thus make it possible to uncover the cellular heterogeneity in a complex cell population which is composed of multiple cell types. Due to the complexity of biological system, different cell types may share underlying gene expression programs (GEPs) at different levels. However, such shared patterns are difficult to study by traditional cluster analysis. Based on the assumption that the expression profile of each cell results from a non-linear combination of multiple GEPs, we develop scAAnet, a deep learning model for non-linear archetypal decomposition of scRNA-seq data. We demonstrate that scAAnet is able to both achieve better decomposition performance in simulated data and identify biologically meaningful GEPs that are either cell-type-specific or disease-enriched in three real scRNA-seq datasets. To help interpret results from scAAnet, we also provide downstream analysis tools for the identification of program-specific marker genes. We expect scAAnet can be applied to explore GEPs shared across cells when scRNA-seq is used to study a complex disease or biological system.
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010025 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10025&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1010025
DOI: 10.1371/journal.pcbi.1010025
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().