Analysis of Correlations Between Sites in Models of Protein Sequences
B. G. Giraud,
Alan Lapedes and
Lon Chang Liu
Working Papers from Santa Fe Institute
Abstract:
A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences, however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from two, illustrating the relation to familiar classical spin systems, to twenty states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces non-independence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: the more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor.
Keywords: Proteins; DNA; RNA; correlations; correlation at a distance; mutations; secondary structure; entropy; spin model; statistical dependence (search for similar items in EconPapers)
Date: 1998-10
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wop:safiwp:98-10-092
Access Statistics for this paper
More papers in Working Papers from Santa Fe Institute Contact information at EDIRC.
Bibliographic data for series maintained by Thomas Krichel ().