Semi-parametric empirical bayes method for multiplet detection in snATAC-seq with probabilistic multi-omic integration
Yuntian Wu,
Haoran Hu,
Wei Chen,
Johann E Gudjonsson,
Lam C Tsoi and
Xiaoquan Wen
PLOS Computational Biology, 2026, vol. 22, issue 4, 1-16
Abstract:
Multiplets arise when multiple cells are captured within the same droplet during single-cell sequencing, producing hybrid molecular profiles that can distort downstream analyses. Detecting multiplets in single-nucleus ATAC-seq (snATAC-seq) data is particularly challenging due to the sparsity and overdispersion of chromatin accessibility measurements. Moreover, computational approaches that jointly leverage evidence across multiple features and data modalities are highly desirable for multiplet detection. We introduce SEBULA, a semi-parametric empirical Bayes framework for multiplet detection in snATAC-seq data. SEBULA models the singlet background directly from observed chromatin accessibility signals using fragment-level information from snATAC-seq data. This approach avoids reliance on synthetic doublets and produces classification probabilities that enable direct false discovery rate control. We further extend SEBULA to integrate complementary evidence from additional features and modalities, such as simultaneously measured gene expression profiles. Across simulations and seven multimodal datasets with hashing-based ground truth, SEBULA demonstrates improved sensitivity and specificity compared with existing snATAC-seq methods. The evidence integration framework achieves comparable or superior performance relative to state-of-the-art multiomic approaches while maintaining computational efficiency.Author summary: Single-cell sequencing has revolutionized biology by allowing researchers to look at the genetic activity of thousands of individual cells simultaneously. However, common technical artifacts occur when two or more cells are accidentally trapped in the same reaction droplet. These “multiplets” create a blurred, hybrid signal that can lead researchers to false biological conclusions. Detecting these artifacts is especially difficult in data that measures chromatin accessibility (i.e., the openness of DNA), which is often sparse and noisy. We developed SEBULA, a new computational tool designed to solve this problem. Unlike existing methods that rely on simulated data to guess what a multiplet looks like, SEBULA learns the characteristics of true single cells directly from the observed data. This makes it more accurate at spotting subtle multiplet signals that other tools miss. Furthermore, SEBULA is built for the latest multimodal technologies that measure different types of biological information at once. It can combine evidence from multiple sources, such as gene activity and DNA structure, to confirm if a droplet contains a single cell or multiple cells. By providing a more reliable way to identify and remove multiplets, SEBULA helps improve the reliability of downstream analyses in single-cell studies.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013653 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13653&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013653
DOI: 10.1371/journal.pcbi.1013653
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().