Token-Mol 1.0: tokenized drug design with large language models
Jike Wang,
Rui Qin,
Mingyang Wang,
Meijing Fang,
Yangyang Zhang,
Yuchen Zhu,
Qun Su,
Qiaolin Gou,
Chao Shen,
Odin Zhang,
Zhenxing Wu,
Dejun Jiang,
Xujun Zhang,
Huifeng Zhao,
Jingxuan Ge,
Zhourui Wu,
Yu Kang (),
Chang-Yu Hsieh () and
Tingjun Hou ()
Additional contact information
Jike Wang: Zhejiang University
Rui Qin: Zhejiang University
Mingyang Wang: Zhejiang University
Meijing Fang: Zhejiang University
Yangyang Zhang: Zhejiang University
Yuchen Zhu: Zhejiang University
Qun Su: Zhejiang University
Qiaolin Gou: Zhejiang University
Chao Shen: Zhejiang University
Odin Zhang: University of Washington
Zhenxing Wu: Zhejiang University
Dejun Jiang: Zhejiang University
Xujun Zhang: Zhejiang University
Huifeng Zhao: Zhejiang University
Jingxuan Ge: Zhejiang University
Zhourui Wu: Tongji University
Yu Kang: Zhejiang University
Chang-Yu Hsieh: Zhejiang University
Tingjun Hou: Zhejiang University
Nature Communications, 2025, vol. 16, issue 1, 1-19
Abstract:
Abstract The integration of large language models (LLMs) into drug design is gaining momentum; however, existing approaches often struggle to effectively incorporate three-dimensional molecular structures. Here, we present Token-Mol, a token-only 3D drug design model that encodes both 2D and 3D structural information, along with molecular properties, into discrete tokens. Built on a transformer decoder and trained with causal masking, Token-Mol introduces a Gaussian cross-entropy loss function tailored for regression tasks, enabling superior performance across multiple downstream applications. The model surpasses existing methods, improving molecular conformation generation by over 10% and 20% across two datasets, while outperforming token-only models by 30% in property prediction. In pocket-based molecular generation, it enhances drug-likeness and synthetic accessibility by approximately 11% and 14%, respectively. Notably, Token-Mol operates 35 times faster than expert diffusion models. In real-world validation, it improves success rates and, when combined with reinforcement learning, further optimizes affinity and drug-likeness, advancing AI-driven drug discovery.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-59628-y Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59628-y
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-59628-y
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().