Protein design and variant prediction using autoregressive generative models
Jung-Eun Shin,
Adam J. Riesselman,
Aaron W. Kollasch,
Conor McMahon,
Elana Simon,
Chris Sander,
Aashish Manglik,
Andrew C. Kruse () and
Debora S. Marks ()
Additional contact information
Jung-Eun Shin: Harvard Medical School
Adam J. Riesselman: Harvard Medical School
Aaron W. Kollasch: Harvard Medical School
Conor McMahon: Harvard Medical School
Elana Simon: Harvard College
Chris Sander: Harvard Medical School
Aashish Manglik: University of California San Francisco
Andrew C. Kruse: Harvard Medical School
Debora S. Marks: Harvard Medical School
Nature Communications, 2021, vol. 12, issue 1, 1-11
Abstract:
Abstract The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (16)
Downloads: (external link)
https://www.nature.com/articles/s41467-021-22732-w Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-22732-w
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-021-22732-w
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().