EconPapers    
Economics at your fingertips  
 

Semi-parametric empirical bayes method for multiplet detection in snATAC-seq with probabilistic multi-omic integration

Yuntian Wu, Haoran Hu, Wei Chen, Johann E Gudjonsson, Lam C Tsoi and Xiaoquan Wen

PLOS Computational Biology, 2026, vol. 22, issue 4, 1-16

Abstract: Multiplets arise when multiple cells are captured within the same droplet during single-cell sequencing, producing hybrid molecular profiles that can distort downstream analyses. Detecting multiplets in single-nucleus ATAC-seq (snATAC-seq) data is particularly challenging due to the sparsity and overdispersion of chromatin accessibility measurements. Moreover, computational approaches that jointly leverage evidence across multiple features and data modalities are highly desirable for multiplet detection. We introduce SEBULA, a semi-parametric empirical Bayes framework for multiplet detection in snATAC-seq data. SEBULA models the singlet background directly from observed chromatin accessibility signals using fragment-level information from snATAC-seq data. This approach avoids reliance on synthetic doublets and produces classification probabilities that enable direct false discovery rate control. We further extend SEBULA to integrate complementary evidence from additional features and modalities, such as simultaneously measured gene expression profiles. Across simulations and seven multimodal datasets with hashing-based ground truth, SEBULA demonstrates improved sensitivity and specificity compared with existing snATAC-seq methods. The evidence integration framework achieves comparable or superior performance relative to state-of-the-art multiomic approaches while maintaining computational efficiency.Author summary: Single-cell sequencing has revolutionized biology by allowing researchers to look at the genetic activity of thousands of individual cells simultaneously. However, common technical artifacts occur when two or more cells are accidentally trapped in the same reaction droplet. These “multiplets” create a blurred, hybrid signal that can lead researchers to false biological conclusions. Detecting these artifacts is especially difficult in data that measures chromatin accessibility (i.e., the openness of DNA), which is often sparse and noisy. We developed SEBULA, a new computational tool designed to solve this problem. Unlike existing methods that rely on simulated data to guess what a multiplet looks like, SEBULA learns the characteristics of true single cells directly from the observed data. This makes it more accurate at spotting subtle multiplet signals that other tools miss. Furthermore, SEBULA is built for the latest multimodal technologies that measure different types of biological information at once. It can combine evidence from multiple sources, such as gene activity and DNA structure, to confirm if a droplet contains a single cell or multiple cells. By providing a more reliable way to identify and remove multiplets, SEBULA helps improve the reliability of downstream analyses in single-cell studies.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013653 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13653&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013653

DOI: 10.1371/journal.pcbi.1013653

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-05-03
Handle: RePEc:plo:pcbi00:1013653