Correlated Mutations in Protein Sequences: Phylogenetic and Structural Effects
A. S. Lapedes,
B. G. Giraud,
L. C. Liu and
G. D. Stormo
Working Papers from Santa Fe Institute
Abstract:
Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure [Gutell(1992)]. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain strcutral and functional links [Korber(1993)], but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure [Gobel(1994)], [Clark(1995)], [Shindyalov(1994)], [Thomas(1996)], [Taylor(1994)], [Neher(1994)]. In this paper we identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a nontrivial phylogenetic tree. We present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. We present a maximum entropy solution to this classic problem of ``causation versus correlation.'' The methodologies are validated in simulation.
Keywords: Correlations; causality; mutations (search for similar items in EconPapers)
Date: 1997-12
References: View complete reference list from CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wop:safiwp:97-12-088
Access Statistics for this paper
More papers in Working Papers from Santa Fe Institute Contact information at EDIRC.
Bibliographic data for series maintained by Thomas Krichel ().