EconPapers    
Economics at your fingertips  
 

Merizo: a rapid and accurate protein domain segmentation method using invariant point attention

Andy M. Lau, Shaun M. Kandathil and David T. Jones ()
Additional contact information
Andy M. Lau: University College London
Shaun M. Kandathil: University College London
David T. Jones: University College London

Nature Communications, 2023, vol. 14, issue 1, 1-11

Abstract: Abstract The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and fine-tuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41467-023-43934-4 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43934-4

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-023-43934-4

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-10
Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43934-4