EconPapers    
Economics at your fingertips  
 

WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning

George L Sutphin, J Matthew Mahoney, Keith Sheppard, David O Walton and Ron Korstanje

PLOS Computational Biology, 2016, vol. 12, issue 11, 1-35

Abstract: The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.Author Summary: Identifying functionally equivalent proteins between species is a fundamental problem in comparative genetics. While orthology does not guarantee functional equivalence, the identification of orthologs—genes in different organisms that diverged by speciation—is often the first step in approaching this problem. Many methods are available for predicting orthologs. Recent approaches combine methods and filter candidate predictions by “voting”—assigning confidence to ortholog pairs based on the number of predictions by independent methods. Although voting is a heuristic, it maintains precision while increasing recall. Here we employ machine learning to optimize voting by learning which methods make better predictions and, in essence, giving those methods more votes. We present a new tool called WORMHOLE that predicts a strict subclass of orthologs called least diverged orthologs (LDOs) with a high level of functional specificity by learning features of orthology that are encoded in the patterns of predictions made by 17 constituent methods. We validate WORMHOLE using multiple measures of evolutionary divergence and functional relatedness, including community standards provided by the Quest for Orthologs consortium. WORMHOLE’s particular strength lies in predicting LDOs between distantly related species, where orthology is difficult to identify and is of critical importance for comparative biology.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005182 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05182&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005182

DOI: 10.1371/journal.pcbi.1005182

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1005182