Mapping the glycosyltransferase fold landscape using interpretable deep learning
Rahil Taujale,
Zhongliang Zhou,
Wayland Yeung,
Kelley W. Moremen,
Sheng Li and
Natarajan Kannan ()
Additional contact information
Rahil Taujale: Institute of Bioinformatics, University of Georgia
Zhongliang Zhou: Department of Computer Science, University of Georgia
Wayland Yeung: Institute of Bioinformatics, University of Georgia
Kelley W. Moremen: Complex Carbohydrate Research Center, University of Georgia
Sheng Li: Department of Computer Science, University of Georgia
Natarajan Kannan: Institute of Bioinformatics, University of Georgia
Nature Communications, 2021, vol. 12, issue 1, 1-12
Abstract:
Abstract Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://www.nature.com/articles/s41467-021-25975-9 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25975-9
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-021-25975-9
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().