Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
Armando Fernandes and
Susana Vinga
PLOS ONE, 2016, vol. 11, issue 3, 1-15
Abstract:
The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems.
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0150369 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 50369&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0150369
DOI: 10.1371/journal.pone.0150369
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().