Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing
Xiaotian Wu (),
Hao Wu () and
Zhijin Wu ()
Additional contact information
Xiaotian Wu: Brown University
Hao Wu: Emory University
Zhijin Wu: Brown University
Statistics in Biosciences, 2021, vol. 13, issue 3, No 9, 543-562
Abstract:
Abstract Single-cell RNA sequencing (scRNA-seq) quantifies RNA transcripts at individual cell level, providing cellular-level resolution of gene expression variation. The scRNA-seq data are counts of RNA transcripts of all genes in species’ genome, which are of very high dimension and contain excessive zero counts. In order to better reduce the data dimension and extract robust and interpretable biological information, we develop a penalized Latent Dirichlet Allocation (pLDA) model for scRNA-seq data. The method is adapted from the generative probabilistic model LDA originated in natural language processing. pLDA models the scRNA-seq data by considering genes as words, cells as documents, and latent biological functions as topics. It imposes a penalty to reflect the characteristics in scRNA-seq that only a small subset of genes are expected to be topic-specific, which increases the robustness of the estimation and interpretability of the results. We apply pLDA to scRNA-seq datasets from both Drop-seq and SMARTer v1 technologies, and demonstrate improved performances in cell-type classification. The topics identified by pLDA are interpretable with biological functions.
Keywords: Single-cell RNA sequencing; Latent Dirichlet Allocation; Topic models; Genomics; Transcriptomics (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s12561-021-09304-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:stabio:v:13:y:2021:i:3:d:10.1007_s12561-021-09304-8
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/12561
DOI: 10.1007/s12561-021-09304-8
Access Statistics for this article
Statistics in Biosciences is currently edited by Hongyu Zhao and Xihong Lin
More articles in Statistics in Biosciences from Springer, International Chinese Statistical Association
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().