EconPapers    
Economics at your fingertips  
 

Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data

Zhigang Li (), Katherine Lee, Margaret R. Karagas, Juliette C. Madan, Anne G. Hoen, A. James O’Malley and Hongzhe Li
Additional contact information
Zhigang Li: Geisel School of Medicine at Dartmouth
Katherine Lee: Phillips Exeter Academy
Margaret R. Karagas: Children’s Environmental Health and Disease Prevention Research Center at Dartmouth
Juliette C. Madan: Children’s Environmental Health and Disease Prevention Research Center at Dartmouth
Anne G. Hoen: Geisel School of Medicine at Dartmouth
A. James O’Malley: Geisel School of Medicine at Dartmouth
Hongzhe Li: University of Pennsylvania School of Medicine

Statistics in Biosciences, 2018, vol. 10, issue 3, No 6, 587-608

Abstract: Abstract The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive numbers of zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part zero-inflated logistic-normal model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers of zeros and the compositional data structure with the discrete part and the logistic-normal part of the model. For parameter estimation, an estimating equations approach is employed that enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shows that our model outperforms existing methods. Our approach is also compared to others using the analysis of real microbiome data.

Keywords: Microbiome data analysis; High dimension; Zero-inflated; Multivariate logistic normal; Relative abundance; Estimating equation (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s12561-018-9219-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:stabio:v:10:y:2018:i:3:d:10.1007_s12561-018-9219-2

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/12561

DOI: 10.1007/s12561-018-9219-2

Access Statistics for this article

Statistics in Biosciences is currently edited by Hongyu Zhao and Xihong Lin

More articles in Statistics in Biosciences from Springer, International Chinese Statistical Association
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:stabio:v:10:y:2018:i:3:d:10.1007_s12561-018-9219-2