Sparse and Compositionally Robust Inference of Microbial Ecological Networks
Zachary D Kurtz,
Christian L Müller,
Emily R Miraldi,
Dan R Littman,
Martin J Blaser and
Richard A Bonneau
PLOS Computational Biology, 2015, vol. 11, issue 5, 1-25
Abstract:
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.Author Summary: Genomic survey of microbes by 16S rRNA gene sequencing and metagenomics has inspired appreciation for the role of complex communities in diverse ecosystems. However, due to the unique properties of community composition data, standard data analysis tools are likely to produce statistical artifacts. For a typical experiment studying microbial ecosystems these artifacts can lead to erroneous conclusions about patterns of associations between microbial taxa. We developed a new procedure that seeks to infer ecological associations between microbial populations, by 1) taking advantage of the proportionality invariance of relative abundance data and 2) making assumptions about the underlying network structure when the number of taxa in the dataset is larger than the number of sampled communities. Additionally, we employed a novel tool to generate biologically plausible synthetic data and objectively benchmark current association inference tools. Finally, we tested our procedures on a large-scale 16S rRNA gene sequencing dataset sampled from the human gut.
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (31)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004226 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04226&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004226
DOI: 10.1371/journal.pcbi.1004226
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().