EconPapers    
Economics at your fingertips  
 

BMDD: A probabilistic framework for accurate imputation of zero-inflated microbiome sequencing data

Huijuan Zhou, Jun Chen and Xianyang Zhang

PLOS Computational Biology, 2025, vol. 21, issue 10, 1-21

Abstract: Microbiome sequencing data are inherently sparse and compositional, with excessive zeros arising from biological absence or insufficient sampling. These zeros pose significant challenges for downstream analyses, particularly those that require log-transformation. We introduce BMDD (BiModal Dirichlet Distribution), a novel probabilistic modeling framework for accurate imputation of microbiome sequencing data. Unlike existing imputation approaches that assume unimodal abundance, BMDD captures the bimodal abundance distribution of the taxa via a mixture of Dirichlet priors. It uses variational inference and a scalable expectation-maximization algorithm for efficient imputation. Through simulations and real microbiome datasets, we demonstrate that BMDD outperforms competing methods in reconstructing true abundances and improves the performance of differential abundance analysis. Through multiple posterior samples, BMDD enables robust inference by accounting for uncertainty in zero imputation. Our method offers a principled and computationally efficient solution for analyzing high-dimensional, zero-inflated microbiome sequencing data and is broadly applicable in microbial biomarker discovery and host-microbiome interaction studies.Author summary: Understanding the microbes living in and on our bodies—the microbiome—relies on analyzing complex sequencing data. However, these data often contain many zeros, either because a microbe is truly absent or simply missed due to insufficient sampling. These missing values make it hard to accurately analyze microbial patterns and identify important differences between groups, especially for methods that work on a log scale. To address this, we developed a new method called BMDD that uses a more realistic model to impute the zeros. Unlike existing tools that assume each microbe follows an unimodal abundance distribution, BMDD allows for microbes to follow a bimodal distribution, so they could behave differently in different conditions. It provides not just a single guess, but a range of possible values to better reflect the uncertainty. Our testing shows that BMDD more accurately recovers the true microbial profiles and improves the ability to detect meaningful differences between groups. This method can help researchers gain clearer insights into how the microbiome affects health and disease.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013124 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13124&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013124

DOI: 10.1371/journal.pcbi.1013124

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-10-26
Handle: RePEc:plo:pcbi00:1013124