An automatic end-to-end chemical synthesis development platform powered by large language models

Ruan, Yixiang; Lu, Chenyin; Xu, Ning; He, Yuchen; Chen, Yixin; Zhang, Jian; Xuan, Jun; Pan, Jianzhang; Fang, Qun; Gao, Hanyu; Shen, Xiaodong; Ye, Ning; Zhang, Qiang; Mo, Yiming

An automatic end-to-end chemical synthesis development platform powered by large language models

Yixiang Ruan, Chenyin Lu, Ning Xu, Yuchen He, Yixin Chen, Jian Zhang, Jun Xuan, Jianzhang Pan, Qun Fang, Hanyu Gao, Xiaodong Shen, Ning Ye, Qiang Zhang and Yiming Mo ()
Additional contact information
Yixiang Ruan: Zhejiang University
Chenyin Lu: ZJU-Hangzhou Global Scientific and Technological Innovation Center
Ning Xu: Zhejiang University
Yuchen He: Zhejiang University
Yixin Chen: Zhejiang University
Jian Zhang: ZJU-Hangzhou Global Scientific and Technological Innovation Center
Jun Xuan: ZJU-Hangzhou Global Scientific and Technological Innovation Center
Jianzhang Pan: ZJU-Hangzhou Global Scientific and Technological Innovation Center
Qun Fang: ZJU-Hangzhou Global Scientific and Technological Innovation Center
Hanyu Gao: The Hong Kong University of Science and Technology
Xiaodong Shen: Suzhou Novartis Technical Development Co. Ltd.
Ning Ye: Rezubio Pharmaceuticals Co. Ltd.
Qiang Zhang: ZJU-Hangzhou Global Scientific and Technological Innovation Center
Yiming Mo: Zhejiang University

Nature Communications, 2024, vol. 15, issue 1, 1-16

Abstract: Abstract The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF’s broader applicability and versability was validated on various synthesis tasks of three distinct reactions (SNAr reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-024-54457-x Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-54457-x

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-024-54457-x

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().