Evaluating the use of large language models to provide clinical recommendations in the Emergency Department
Christopher Y. K. Williams (),
Brenda Y. Miao,
Aaron E. Kornblith and
Atul J. Butte
Additional contact information
Christopher Y. K. Williams: San Francisco
Brenda Y. Miao: San Francisco
Aaron E. Kornblith: San Francisco
Atul J. Butte: San Francisco
Nature Communications, 2024, vol. 15, issue 1, 1-10
Abstract:
Abstract The release of GPT-4 and other large language models (LLMs) has the potential to transform healthcare. However, existing research evaluating LLM performance on real-world clinical notes is limited. Here, we conduct a highly-powered study to determine whether LLMs can provide clinical recommendations for three tasks (admission status, radiological investigation(s) request status, and antibiotic prescription status) using clinical notes from the Emergency Department. We randomly selected 10,000 Emergency Department visits to evaluate the accuracy of zero-shot, GPT-3.5-turbo- and GPT-4-turbo-generated clinical recommendations across four different prompting strategies. We found that both GPT-4-turbo and GPT-3.5-turbo performed poorly compared to a resident physician, with accuracy scores 8% and 24%, respectively, lower than physician on average. Both LLMs tended to be overly cautious in its recommendations, with high sensitivity at the cost of specificity. Our findings demonstrate that, while early evaluations of the clinical use of LLMs are promising, LLM performance must be significantly improved before their deployment as decision support systems for clinical recommendations and other complex tasks.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-024-52415-1 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-52415-1
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-024-52415-1
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().