EconPapers    
Economics at your fingertips  
 

Reliable estimation of tree branch lengths using deep neural networks

Anton Suvorov and Daniel R Schrider

PLOS Computational Biology, 2024, vol. 20, issue 8, 1-25

Abstract: A phylogenetic tree represents hypothesized evolutionary history for a set of taxa. Besides the branching patterns (i.e., tree topology), phylogenies contain information about the evolutionary distances (i.e. branch lengths) between all taxa in the tree, which include extant taxa (external nodes) and their last common ancestors (internal nodes). During phylogenetic tree inference, the branch lengths are typically co-estimated along with other phylogenetic parameters during tree topology space exploration. There are well-known regions of the branch length parameter space where accurate estimation of phylogenetic trees is especially difficult. Several novel studies have recently demonstrated that machine learning approaches have the potential to help solve phylogenetic problems with greater accuracy and computational efficiency. In this study, as a proof of concept, we sought to explore the possibility of machine learning models to predict branch lengths. To that end, we designed several deep learning frameworks to estimate branch lengths on fixed tree topologies from multiple sequence alignments or its representations. Our results show that deep learning methods can exhibit superior performance in some difficult regions of branch length parameter space. For example, in contrast to maximum likelihood inference, which is typically used for estimating branch lengths, deep learning methods are more efficient and accurate. In general, we find that our neural networks achieve similar accuracy to a Bayesian approach and are the best-performing methods when inferring long branches that are associated with distantly related taxa. Together, our findings represent a next step toward accurate, fast, and reliable phylogenetic inference with machine learning approaches.Author summary: Phylogenetic trees that delineate organismal relationships serve as a cornerstone structure for almost any basic research leveraging evolutionary information. Besides the tree topology, phylogeneticists are concerned with estimating other fundamental phylogenetic parameters such as the lengths of each branch in the tree. The tree branch lengths are proportional to evolutionary distances between taxa, with long branches representing distantly related taxa and/or accelerated evolution, whereas short branches are indicative of close taxonomic relationships and/or slower evolutionary rates. There is a plethora of phylogenetic methods that can infer branch lengths from sequence data, but they typically exhibit elevated error rates within certain regions of the branch length parameter space and thus in some cases may provide poor estimates. Here, as a proof-of-concept study, we explored the possibility of using artificial neural networks (ANNs) to accurately estimate branch lengths directly from sequence data or its summaries. We show that ANNs can reliably infer branch lengths with accuracy on par with or even better than traditional methods such as Bayesian and maximum likelihood approaches, especially when branches are long. We argue that further investigation of machine learning methods could lead to marked improvements in phylogenetic inference.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012337 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12337&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012337

DOI: 10.1371/journal.pcbi.1012337

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-03
Handle: RePEc:plo:pcbi00:1012337