Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Hirosawa, Takanobu; Harada, Yukinori; Yokose, Masashi; Sakamoto, Tetsu; Kawamura, Ren; Shimizu, Taro

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Takanobu Hirosawa (), Yukinori Harada, Masashi Yokose, Tetsu Sakamoto, Ren Kawamura and Taro Shimizu
Additional contact information
Takanobu Hirosawa: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
Yukinori Harada: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
Masashi Yokose: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
Tetsu Sakamoto: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
Ren Kawamura: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
Taro Shimizu: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan

IJERPH, 2023, vol. 20, issue 4, 1-10

Abstract: The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.

Keywords: artificial intelligence; generative pretrained transformers; diagnostic accuracy; diagnosis; clinical decision support; AI chatbot; natural language processing (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.mdpi.com/1660-4601/20/4/3378/pdf (application/pdf)
https://www.mdpi.com/1660-4601/20/4/3378/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:20:y:2023:i:4:p:3378-:d:1068780

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().