Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

Kohnert, Eva; Kreutz, Clemens

Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

Eva Kohnert and Clemens Kreutz

PLOS ONE, 2025, vol. 20, issue 5, 1-22

Abstract: Differential abundance (DA) analysis of metagenomic microbiome data is essential for understanding microbial community dynamics across various environments and hosts. Identifying microorganisms that differ significantly in abundance between conditions (e.g., health vs. disease) is crucial for insights into environmental adaptations, disease development, and host health. However, the statistical interpretation of microbiome data is challenged by inherent sparsity and compositional nature, necessitating tailored DA methods. This benchmarking study aims to simulate synthetic 16S microbiome data using metaSPARSim (Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. https://doi.org/10.1186/s12859-019-2882-6 PMID: 31757204) MIDASim (He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: https://doi.org/10.1101/2023.03.23.533996), and sparseDOSSA2 (Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. https://doi.org/10.1371/journal.pcbi.1008913 PMID: 34516542) , leveraging 38 real-world experimental templates (S3 Table) previously utilized in a benchmark study comparing DA tools. These datasets, drawn from diverse environments such as human gut, soil, and marine habitats, serve as the foundation for our simulation efforts. We employ the same 14 DA tests that were previously used with the same experimental data in benchmark studies alongside 8 DA tests that were developed subsequently. Initially, we will generate synthetic data closely mirroring the experimental datasets, incorporating a known truth to cover a broad range of real-world data characteristics. This approach allows us to assess the ability of DA methods to recover known true differential abundances. We will further simulate datasets by altering sparsity, effect size, and sample size, thus creating a comprehensive collection for applying the 22 DA tests. The outcomes, focusing on sensitivities and specificities, will provide insights into the performance of DA tests and their dependencies on sparsity, effect size, and sample size. Additionally, we will calculate data characteristics (S1 and S2 Table) for each simulated dataset and use a multiple regression to identify informative data characteristics influencing test performance. Our prior study, where we used simulated data without incorporating a known truth, demonstrated the feasibility of using synthetic data to validate experimental findings. This current study aims to enhance our understanding by systematically evaluating the impact of known truth incorporation on DA test performance, thereby providing further information for the selection and application of DA methods in microbiome research.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0321452 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 21452&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0321452

DOI: 10.1371/journal.pone.0321452

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().