Towards conversational diagnostic artificial intelligence
Tao Tu (),
Mike Schaekermann (),
Anil Palepu,
Khaled Saab,
Jan Freyberg,
Ryutaro Tanno,
Amy Wang,
Brenna Li,
Mohamed Amin,
Yong Cheng,
Elahe Vedadi,
Nenad Tomasev,
Shekoofeh Azizi,
Karan Singhal,
Le Hou,
Albert Webson,
Kavita Kulkarni,
S. Sara Mahdavi,
Christopher Semturs,
Juraj Gottweis,
Joelle Barral,
Katherine Chou,
Greg S. Corrado,
Yossi Matias,
Alan Karthikesalingam () and
Vivek Natarajan ()
Additional contact information
Tao Tu: Google Research
Mike Schaekermann: Google Research
Anil Palepu: Google Research
Khaled Saab: Google Research
Jan Freyberg: Google Research
Ryutaro Tanno: Google DeepMind
Amy Wang: Google Research
Brenna Li: Google Research
Mohamed Amin: Google Research
Yong Cheng: Google DeepMind
Elahe Vedadi: Google Research
Nenad Tomasev: Google DeepMind
Shekoofeh Azizi: Google DeepMind
Karan Singhal: Google Research
Le Hou: Google Research
Albert Webson: Google DeepMind
Kavita Kulkarni: Google Research
S. Sara Mahdavi: Google DeepMind
Christopher Semturs: Google Research
Juraj Gottweis: Google Research
Joelle Barral: Google DeepMind
Katherine Chou: Google Research
Greg S. Corrado: Google Research
Yossi Matias: Google Research
Alan Karthikesalingam: Google Research
Vivek Natarajan: Google Research
Nature, 2025, vol. 642, issue 8067, 442-450
Abstract:
Abstract At the heart of medicine lies physician–patient dialogue, where skillful history-taking enables effective diagnosis, management and enduring trust1,2. Artificial intelligence (AI) systems capable of diagnostic dialogue could increase accessibility and quality of care. However, approximating clinicians’ expertise is an outstanding challenge. Here we introduce AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based AI system optimized for diagnostic dialogue. AMIE uses a self-play-based3 simulated environment with automated feedback for scaling learning across disease conditions, specialties and contexts. We designed a framework for evaluating clinically meaningful axes of performance, including history-taking, diagnostic accuracy, management, communication skills and empathy. We compared AMIE’s performance to that of primary care physicians in a randomized, double-blind crossover study of text-based consultations with validated patient-actors similar to objective structured clinical examination4,5. The study included 159 case scenarios from providers in Canada, the United Kingdom and India, 20 primary care physicians compared to AMIE, and evaluations by specialist physicians and patient-actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 30 out of 32 axes according to the specialist physicians and 25 out of 26 axes according to the patient-actors. Our research has several limitations and should be interpreted with caution. Clinicians used synchronous text chat, which permits large-scale LLM–patient interactions, but this is unfamiliar in clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41586-025-08866-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:642:y:2025:i:8067:d:10.1038_s41586-025-08866-7
Ordering information: This journal article can be ordered from
https://www.nature.com/
DOI: 10.1038/s41586-025-08866-7
Access Statistics for this article
Nature is currently edited by Magdalena Skipper
More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().