An integrative approach to protein sequence design through multiobjective optimization
Lu Hong and
Tanja Kortemme
PLOS Computational Biology, 2024, vol. 20, issue 7, 1-37
Abstract:
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the two-state design problem of the foldswitching protein RfaH as an in-depth case study, and PapD and calmodulin as examples of higher-dimensional design problems, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.Author summary: Proteins are the fundamental building blocks of life, and engineering them has broad applications in medicine and biotechnology. Computational methods that seek to model and predict the protein sequence-structure-function relationship have seen significant advancement from the recent development in deep learning. As more models become available, it remains an open question how to effectively combine them into a coherent computational design approach. One approach is to perform computational design with one model, and filter the design candidates with the others. In this work, we demonstrate how an optimization algorithm inspired by evolution can be adapted to provide an alternative framework that outperforms the post hoc filtering approach, especially for problems with multiple competing design specifications. Such a framework may enable researchers to more effectively integrate the strengths of different modeling approaches to tackle more complex design problems.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011953 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11953&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011953
DOI: 10.1371/journal.pcbi.1011953
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().