EconPapers    
Economics at your fingertips  
 

Accurate Bayesian phylogenetic point estimation using a tree distribution parameterized by clade probabilities

Lars Berling, Jonathan Klawitter, Remco Bouckaert, Dong Xie, Alex Gavryushkin and Alexei J Drummond

PLOS Computational Biology, 2025, vol. 21, issue 2, 1-21

Abstract: Bayesian phylogenetic analysis with MCMC algorithms generates an estimate of the posterior distribution of phylogenetic trees in the form of a sample of phylogenetic trees and related parameters. The high dimensionality and non-Euclidean nature of tree space complicates summarizing the central tendency and variance of the posterior distribution in tree space. Here we introduce a new tractable tree distribution and associated point estimator that can be constructed from a posterior sample of trees. Through simulation studies we show that this point estimator performs at least as well and often better than standard methods of producing Bayesian posterior summary trees. We also show that the method of summary that performs best depends on the sample size and dimensionality of the problem in non-trivial ways.Author summary: Our research introduces novel methods to analyse a set of phylogenetic tree topologies, such as those generated by Bayesian Markov Chain Monte Carlo algorithms. We define a new model for a distribution on trees that is based on observed clade frequencies. We study it together with closely related models that are based on observed clade split frequencies. These distributions are easy to work with and, as we show experimentally, provide excellent estimates of the true posterior distribution. Furthermore, we demonstrate that they enable us to find the tree with the highest posterior probability, which acts as a summary tree or point estimate of the distribution. In simulation studies, we show that the new methods performs as least as well or better than existing methods. Additionally, we highlight that choosing the best method for summarizing sets of trees remains challenging, as it depends on the sample size and complexity of the problem in non-trivial ways. This work has the potential to improve the accuracy of phylogenetic studies.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012789 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12789&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012789

DOI: 10.1371/journal.pcbi.1012789

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pcbi00:1012789