Sequence-structure-function relationships in the microbial protein universe
Julia Koehler Leman (),
Pawel Szczerbiak,
P. Douglas Renfrew,
Vladimir Gligorijevic,
Daniel Berenberg,
Tommi Vatanen,
Bryn C. Taylor,
Chris Chandler,
Stefan Janssen,
Andras Pataki,
Nick Carriero,
Ian Fisk,
Ramnik J. Xavier,
Rob Knight,
Richard Bonneau and
Tomasz Kosciolek ()
Additional contact information
Julia Koehler Leman: Flatiron Institute, Simons Foundation
Pawel Szczerbiak: Jagiellonian University
P. Douglas Renfrew: Flatiron Institute, Simons Foundation
Vladimir Gligorijevic: Flatiron Institute, Simons Foundation
Daniel Berenberg: Flatiron Institute, Simons Foundation
Tommi Vatanen: Broad Institute
Bryn C. Taylor: University of California San Diego
Chris Chandler: Flatiron Institute, Simons Foundation
Stefan Janssen: University of California, San Diego
Andras Pataki: Flatiron Institute, Simons Foundation
Nick Carriero: Flatiron Institute, Simons Foundation
Ian Fisk: Flatiron Institute, Simons Foundation
Ramnik J. Xavier: Broad Institute
Rob Knight: University of California San Diego
Richard Bonneau: Flatiron Institute, Simons Foundation
Tomasz Kosciolek: Jagiellonian University
Nature Communications, 2023, vol. 14, issue 1, 1-11
Abstract:
Abstract For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don’t rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41467-023-37896-w Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-37896-w
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-023-37896-w
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().