EconPapers    
Economics at your fingertips  
 

Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments

Lukas Burger and Erik van Nimwegen

PLOS Computational Biology, 2010, vol. 6, issue 1, 1-18

Abstract: Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.Author Summary: Whenever two residues are in close contact in the structure of a protein, their interaction will often constrain which amino acid substitutions can occur without perturbing the functionality of the protein, leading to “co-evolution” of the residues. With the large amount of data currently available, deep multiple alignments can be constructed of protein sequences that likely fold into a common structure, and several methods have been proposed for predicting contacting residues from statistical dependencies exhibited by pairs of alignment columns. Unfortunately, strong statistical dependencies are also observed between many pairs of residues that are distal in the structure. Through a comprehensive analysis of 2009 protein domains, we show that a large fraction of these distal dependencies are indirect and result from chains of contacting pairs that percolate through the protein. We present a Bayesian network model that rigorously disentangles direct from indirect dependencies and show that this greatly improves contact prediction. Additionally, we develop an informative prior that takes into account that the probability for residues to be in contact depends on their primary sequence separation, and that highly conserved residues tend to participate in a larger number of contacts. With this prior, the accuracy of the contact predictions is dramatically improved.

Date: 2010
References: View complete reference list from CitEc
Citations: View citations in EconPapers (11)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000633 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 00633&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1000633

DOI: 10.1371/journal.pcbi.1000633

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1000633