EconPapers    
Economics at your fingertips  
 

Application of Neural Networks and Other Machine Learning Algorithms to DNA Sequence Analysis

A. Lapedes, C. Barnes, C. Burks, R. Farber and K. Sirotkin

Working Papers from Santa Fe Institute

Abstract: In this article we report initial, quantitative results on application of simple neural networks and simple machine learning methods, to two problems in DNA sequence analysis. The two problems we consider are:

(1) Determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. An accuracy of 99.4% is reported for procaryotic DNA (E. coli) and 98.4% for eucaryotic DNA (H. Sapiens genes known to be expressed in liver).

(2) Determination of whether eucaryotic DNA sequence segments containing the dinucleotides ``AG'' or ``GT'' are transcribed to RNA splice junctions. An accuracy of 91.2% was achieved on intron/exon splice junctions (acceptor sites) and 94.5% on exon/intron splice junctions (donor sites).

The solution of these two problems, by use of information processing algorithms operating on unannotated base sequences and without recourse to biological laboratory work, is relevant to the Human Genome Project. A variety of neural network, machine learning, and information theoretic algorithms are used. (For the purposes of this article, we view neural networks solely as an information processing procedure and do not consider the possible relation of these formal models to biological networks of neurons.) The accuracies obtained exceed those of previous investigations for which quantitative results are available in the literature. They result from an ongoing program of research that applies machine learning algorithms to the problem of determining biological function of DNA sequences. Some predictions of possible new genes using these methods are listed---although a complete survey of the H. sapiens and E. coli sections of GenBank using these methods will be given elsewhere.

Date: 1995-02
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wop:safiwp:95-02-011

Access Statistics for this paper

More papers in Working Papers from Santa Fe Institute Contact information at EDIRC.
Bibliographic data for series maintained by Thomas Krichel (krichel@openlib.org).

 
Page updated 2025-03-22
Handle: RePEc:wop:safiwp:95-02-011