Assessing the Accuracy of Ancestral Protein Reconstruction Methods
Paul D Williams,
David D Pollock,
Benjamin P Blackburne and
Richard A Goldstein
PLOS Computational Biology, 2006, vol. 2, issue 6, 1-8
Abstract:
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated. Synopsis: It is now possible to apply computational methods to known current protein sequences to recreate the sequences of ancestral proteins. By synthesising these proteins and measuring their properties in the laboratory, we can gain much information about the nature of evolution, better understand how proteins change and adapt over time, and develop insights into the environments of ancient organisms. Unfortunately, the accuracy of these reconstructions is difficult to evaluate. We simulate protein evolution using a simplified computational model and apply the various reconstruction methods to the sequences that arise from our simulations. Because we have the complete record of the evolutionary history, we can evaluate the reconstruction accuracy directly. We demonstrate that the reconstruction procedures in common use may have a bias toward overestimating the properties of these ancestral proteins, opposite to what has been assumed previously. An alternative method of creating these sequences is presented, Bayesian sampling, that can eliminate this bias and provide more robust conclusions.
Date: 2006
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0020069 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 20069&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:0020069
DOI: 10.1371/journal.pcbi.0020069
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().