Predicting Protein Secondary Structure Using Neural Net and Statistical Methods
Paul Stolorz,
Alan Lapedes and
Yuan Xia
Working Papers from Santa Fe Institute
Abstract:
A comparison of neural network methods, and Bayesian statistical methods, is presented for prediction of the secondary structure of proteins given their primary sequence. The Bayesian method makes the unphysical assumption that the probability of an amino acid occurring in each position in the protein is independent of the amino acids occurring elsewhere. However, we find the predictive accuracy of the Bayesian method to be only minimally less than the accuracy of the most sophisticated methods used to date.
We present the relationship of neural network methods to Bayesian statistical methods and show that in principle neural methods offer considerable power, although apparently it is not particularly useful for this problem. In the process, we derive a neural formalism in which the output neurons directly represent the conditional probabilities of structure class. The probabilistic formalism allows introduction of a new objective function, the mutual information, which translates the notion of correlation as a measure of predictive accuracy into a useful training measure. Although a similar accuracy to other approaches (utilising a Mean Square Error) is achieved using this new measure, the accuracy on the training set is significantly, and tantalisingly, higher, even though the number of adjustable parameters remains the same. The mutual information measure predicts a greater fraction of helix and sheet structures correctly than the mean square error measure, at the expense of coil accuracy -- precisely as it was designed to do.
By combining the two objective functions, we obtain a marginally improved accuracy of 64.4%, with Mathews coefficients $C_\alpha$, $C_\beta$ and $C_{coil}$ of 0.40, 0.32 and 0.42 respectively. However, since all methods to date perform only slightly better than the Bayes algorithm which entails the drastic assumption of independence of amino acids, one is forced to conclude that little progress has been made on this problem despite the application of a variety of sophisticated algorithms such as neural networks, and that further advances will require a better understanding of the relevant biophysics.
Date: 1995-02
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wop:safiwp:95-02-014
Access Statistics for this paper
More papers in Working Papers from Santa Fe Institute Contact information at EDIRC.
Bibliographic data for series maintained by Thomas Krichel ().