MolE: a foundation model for molecular graphs using disentangled attention
Oscar Méndez-Lucio (),
Christos A. Nicolaou and
Berton Earnshaw ()
Additional contact information
Oscar Méndez-Lucio: Recursion
Christos A. Nicolaou: Recursion
Berton Earnshaw: Recursion
Nature Communications, 2024, vol. 15, issue 1, 1-9
Abstract:
Abstract Models that accurately predict properties based on chemical structure are valuable tools in the chemical sciences. However, for many properties, public and private training sets are typically small, making it difficult for models to generalize well outside of the training data. Recently, this lack of generalization has been mitigated by using self-supervised pretraining on large unlabeled datasets, followed by finetuning on smaller, labeled datasets. Inspired by these advances, we report MolE, a Transformer architecture adapted for molecular graphs together with a two-step pretraining strategy. The first step of pretraining is a self-supervised approach focused on learning chemical structures trained on ~842 million molecular graphs, and the second step is a massive multi-task approach to learn biological information. We show that finetuning models that were pretrained in this way perform better than the best published results on 10 of the 22 ADMET (absorption, distribution, metabolism, excretion and toxicity) tasks included in the Therapeutic Data Commons leaderboard (c. September 2023).
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-024-53751-y Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53751-y
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-024-53751-y
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().