Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

Xing, Chuanhua; Dunson, David B

Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

Chuanhua Xing and David B Dunson

PLOS Computational Biology, 2011, vol. 7, issue 7, 1-10

Abstract: Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility. Author Summary: Protein interactions are the basic units in almost all biological processes. It is thus vitally important to reconstruct protein-protein interactions (PPIs) before we can fully understand biological processes. However, critical difficulties exist. Particularly the rate of wrongly predicting PPIs to be true (false positive rate) is extremely high in PPIs prediction. The traditional approaches of error correction from each generating source can be both time-consuming and inefficient. We propose a method that can substantially reduce false positive rates by emphasizing information from more reliable data sources, and de-emphasizing less reliable sources. We indicate that it is indeed the case from our extensive studies. Our predictions also suggest that large numbers of not only false positives but also false negatives may exist in previous studies, as validated by two human PPIs datasets having high quality. The ability to predict large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and speed up PPIs prediction with high quality. Reliable prediction from our method may benefit other studies involving such as protein function prediction and roles of PPIs in disease susceptibility.

Date: 2011
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002110 (text/html)
https://journals.plos.org/ploscompbiol/article?id= ... 02110&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1002110

DOI: 10.1371/journal.pcbi.1002110

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().