De novo protein design by deep network hallucination

Anishchenko, Ivan; Pellock, Samuel J.; Chidyausiku, Tamuka M.; Ramelot, Theresa A.; Ovchinnikov, Sergey; Hao, Jingzhou; Bafna, Khushboo; Norn, Christoffer; Kang, Alex; Bera, Asim K.; DiMaio, Frank; Carter, Lauren; Chow, Cameron M.; Montelione, Gaetano T.; Baker, David

De novo protein design by deep network hallucination

Ivan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione and David Baker ()
Additional contact information
Ivan Anishchenko: University of Washington
Samuel J. Pellock: University of Washington
Tamuka M. Chidyausiku: University of Washington
Theresa A. Ramelot: Rensselaer Polytechnic Institute
Sergey Ovchinnikov: Harvard University
Jingzhou Hao: Rensselaer Polytechnic Institute
Khushboo Bafna: Rensselaer Polytechnic Institute
Christoffer Norn: University of Washington
Alex Kang: University of Washington
Asim K. Bera: University of Washington
Frank DiMaio: University of Washington
Lauren Carter: University of Washington
Cameron M. Chow: University of Washington
Gaetano T. Montelione: Rensselaer Polytechnic Institute
David Baker: University of Washington

Nature, 2021, vol. 600, issue 7889, 547-552

Abstract: Abstract There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1–3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue–residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback–Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-‘hallucinated’ sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.

Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (9)

Downloads: (external link)
https://www.nature.com/articles/s41586-021-04184-w Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:600:y:2021:i:7889:d:10.1038_s41586-021-04184-w

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-021-04184-w

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().