Use of Artificial Genomes in Assessing Methods for Atypical Gene Detection
Rajeev K Azad and
Jeffrey G Lawrence
PLOS Computational Biology, 2005, vol. 1, issue 6, 1-13
Abstract:
Parametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods—as well as the evaluation and proper implementation of existing methods—relies on an arbitrary assessment of performance using real genomes, where the evolutionary histories of genes are not known. We have used the framework of a generalized hidden Markov model to create artificial genomes modeled after genuine genomes. To model a genome, “core” genes—those displaying patterns of mutational biases shared among large numbers of genes—are identified by a novel gene clustering approach based on the Akaike information criterion. Gene models derived from multiple “core” gene clusters are used to generate an artificial genome that models the properties of a genuine genome. Chimeric artificial genomes—representing those having experienced lateral gene transfer—were created by combining genes from multiple artificial genomes, and the performance of the parametric methods for identifying “atypical” genes was assessed directly. We found that a hidden Markov model that included multiple gene models, each trained on sets of genes representing the range of genotypic variability within a genome, could produce artificial genomes that mimicked the properties of genuine genomes. Moreover, different methods for detecting foreign genes performed differently—i.e., they had different sets of strengths and weaknesses—when identifying atypical genes within chimeric artificial genomes.Synopsis: Bacterial genomes contain genes that come from two sources; although most genes are inherited directly from parent cells at cell division, others may come into the genome from an unrelated organism. Often, these foreign genes can be detected because their sequences have compositional properties that differ from those of other genes in the genome. Methods for detecting atypical genes are difficult to assess because there are no genuine genomes wherein the histories of all genes are known. Here, the authors describe a method for creating artificial genomes that mimic the properties of genuine genomes, including containing “foreign” genes. The researchers used these constructs (a) to evaluate existing methods for finding foreign genes based on their atypical properties and (b) to test a new method for finding atypical genes. The researchers found that existing methods differ in their abilities to detect genes from different sources and that combining different methods can improve overall performance. The new method for finding atypical genes—which also identified sets of genes that share their unusual properties—worked very well in identifying potentially foreign genes in artificial, chimeric genomes.
Date: 2005
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0010056 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10056&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:0010056
DOI: 10.1371/journal.pcbi.0010056
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().