Using naïve Bayesian classification as a meta-predictor to improve start codon prediction accuracy in prokaryotic organisms
Sean Landman and
Imad Rahal
International Journal of Data Mining, Modelling and Management, 2013, vol. 5, issue 3, 246-260
Abstract:
Modern gene location prediction techniques are able to achieve near-perfect accuracy for prokaryotic organisms, but this reported accuracy is generally only for the stop codon locations. Accurate prediction of the start codon locations is more difficult to attain, and different approaches often produce conflicting predictions for the same gene. In this paper, we describe a new approach to resolve these conflicts and improve start codon prediction accuracy. Our approach uses a set of gene location prediction results from other popular prediction approaches to find consistently predicted gene locations. It then uses these consistent genes as a training set for a naïve Bayesian classifier to improve accuracy in the ambiguous genes, those in which there are some inconsistencies in the predicted start codon location among the original predictions. The methods detailed here apply to prokaryotic organisms, using E. coli and the EcoGene Verified Set database as a case study.
Keywords: gene location prediction; start codon locations; Bayesian classification; meta-predictor; prokaryote; genomics; E. coli; Ecogene; data mining; prokaryotic organisms. (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=55864 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:5:y:2013:i:3:p:246-260
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().