Uncovering new families and folds in the natural protein universe
Janani Durairaj,
Andrew M. Waterhouse,
Toomas Mets,
Tetiana Brodiazhenko,
Minhal Abdullah,
Gabriel Studer,
Gerardo Tauriello,
Mehmet Akdel,
Antonina Andreeva,
Alex Bateman,
Tanel Tenson,
Vasili Hauryliuk,
Torsten Schwede () and
Joana Pereira ()
Additional contact information
Janani Durairaj: Biozentrum, University of Basel
Andrew M. Waterhouse: Biozentrum, University of Basel
Toomas Mets: University of Tartu
Tetiana Brodiazhenko: University of Tartu
Minhal Abdullah: University of Tartu
Gabriel Studer: Biozentrum, University of Basel
Gerardo Tauriello: Biozentrum, University of Basel
Mehmet Akdel: VantAI
Antonina Andreeva: European Bioinformatics Institute (EMBL-EBI)
Alex Bateman: European Bioinformatics Institute (EMBL-EBI)
Tanel Tenson: University of Tartu
Vasili Hauryliuk: University of Tartu
Torsten Schwede: Biozentrum, University of Basel
Joana Pereira: Biozentrum, University of Basel
Nature, 2023, vol. 622, issue 7983, 646-653
Abstract:
Abstract We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.
Date: 2023
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41586-023-06622-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06622-3
Ordering information: This journal article can be ordered from
https://www.nature.com/
DOI: 10.1038/s41586-023-06622-3
Access Statistics for this article
Nature is currently edited by Magdalena Skipper
More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().