EconPapers    
Economics at your fingertips  
 

Bayesian modelling of compositional heterogeneity in molecular phylogenetics

Heaps Sarah E., Nye Tom M.W. (), Boys Richard J., Williams Tom A. and Embley T. Martin
Additional contact information
Heaps Sarah E.: School of Mathematics and Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Catherine Cookson Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
Nye Tom M.W.: School of Mathematics and Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
Boys Richard J.: School of Mathematics and Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
Williams Tom A.: Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Catherine Cookson Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
Embley T. Martin: Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Catherine Cookson Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK

Statistical Applications in Genetics and Molecular Biology, 2014, vol. 13, issue 5, 589-609

Abstract: In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.

Keywords: bacterial evolution; marginal likelihood; phylogenetics; root; tree of life (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2013-0077 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:13:y:2014:i:5:p:21:n:5

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.1515/sagmb-2013-0077

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:13:y:2014:i:5:p:21:n:5