BMDD: A probabilistic framework for accurate imputation of zero-inflated microbiome sequencing data
Huijuan Zhou,
Jun Chen and
Xianyang Zhang
PLOS Computational Biology, 2025, vol. 21, issue 10, 1-21
Abstract:
Microbiome sequencing data are inherently sparse and compositional, with excessive zeros arising from biological absence or insufficient sampling. These zeros pose significant challenges for downstream analyses, particularly those that require log-transformation. We introduce BMDD (BiModal Dirichlet Distribution), a novel probabilistic modeling framework for accurate imputation of microbiome sequencing data. Unlike existing imputation approaches that assume unimodal abundance, BMDD captures the bimodal abundance distribution of the taxa via a mixture of Dirichlet priors. It uses variational inference and a scalable expectation-maximization algorithm for efficient imputation. Through simulations and real microbiome datasets, we demonstrate that BMDD outperforms competing methods in reconstructing true abundances and improves the performance of differential abundance analysis. Through multiple posterior samples, BMDD enables robust inference by accounting for uncertainty in zero imputation. Our method offers a principled and computationally efficient solution for analyzing high-dimensional, zero-inflated microbiome sequencing data and is broadly applicable in microbial biomarker discovery and host-microbiome interaction studies.Author summary: Understanding the microbes living in and on our bodies—the microbiome—relies on analyzing complex sequencing data. However, these data often contain many zeros, either because a microbe is truly absent or simply missed due to insufficient sampling. These missing values make it hard to accurately analyze microbial patterns and identify important differences between groups, especially for methods that work on a log scale. To address this, we developed a new method called BMDD that uses a more realistic model to impute the zeros. Unlike existing tools that assume each microbe follows an unimodal abundance distribution, BMDD allows for microbes to follow a bimodal distribution, so they could behave differently in different conditions. It provides not just a single guess, but a range of possible values to better reflect the uncertainty. Our testing shows that BMDD more accurately recovers the true microbial profiles and improves the ability to detect meaningful differences between groups. This method can help researchers gain clearer insights into how the microbiome affects health and disease.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013124 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13124&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013124
DOI: 10.1371/journal.pcbi.1013124
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().