EconPapers    
Economics at your fingertips  
 

Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing

Xiaotian Wu (), Hao Wu () and Zhijin Wu ()
Additional contact information
Xiaotian Wu: Brown University
Hao Wu: Emory University
Zhijin Wu: Brown University

Statistics in Biosciences, 2021, vol. 13, issue 3, No 9, 543-562

Abstract: Abstract Single-cell RNA sequencing (scRNA-seq) quantifies RNA transcripts at individual cell level, providing cellular-level resolution of gene expression variation. The scRNA-seq data are counts of RNA transcripts of all genes in species’ genome, which are of very high dimension and contain excessive zero counts. In order to better reduce the data dimension and extract robust and interpretable biological information, we develop a penalized Latent Dirichlet Allocation (pLDA) model for scRNA-seq data. The method is adapted from the generative probabilistic model LDA originated in natural language processing. pLDA models the scRNA-seq data by considering genes as words, cells as documents, and latent biological functions as topics. It imposes a penalty to reflect the characteristics in scRNA-seq that only a small subset of genes are expected to be topic-specific, which increases the robustness of the estimation and interpretability of the results. We apply pLDA to scRNA-seq datasets from both Drop-seq and SMARTer v1 technologies, and demonstrate improved performances in cell-type classification. The topics identified by pLDA are interpretable with biological functions.

Keywords: Single-cell RNA sequencing; Latent Dirichlet Allocation; Topic models; Genomics; Transcriptomics (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s12561-021-09304-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:stabio:v:13:y:2021:i:3:d:10.1007_s12561-021-09304-8

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/12561

DOI: 10.1007/s12561-021-09304-8

Access Statistics for this article

Statistics in Biosciences is currently edited by Hongyu Zhao and Xihong Lin

More articles in Statistics in Biosciences from Springer, International Chinese Statistical Association
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:stabio:v:13:y:2021:i:3:d:10.1007_s12561-021-09304-8