An automated framework for assessing how well LLMs cite relevant medical references

Wu, Kevin; Wu, Eric; Wei, Kevin; Zhang, Angela; Casasola, Allison; Nguyen, Teresa; Riantawan, Sith; Shi, Patricia; Ho, Daniel; Zou, James

An automated framework for assessing how well LLMs cite relevant medical references

Kevin Wu, Eric Wu, Kevin Wei, Angela Zhang, Allison Casasola, Teresa Nguyen, Sith Riantawan, Patricia Shi, Daniel Ho and James Zou ()
Additional contact information
Kevin Wu: Stanford University
Eric Wu: Stanford University
Kevin Wei: Keck Medicine of USC
Angela Zhang: Stanford University
Allison Casasola: Stanford University
Teresa Nguyen: Stanford University
Sith Riantawan: Keck Medicine of USC
Patricia Shi: Loma Linda University School of Medicine
Daniel Ho: Stanford Law School
James Zou: Stanford University

Nature Communications, 2025, vol. 16, issue 1, 1-10

Abstract: Abstract As large language models (LLMs) are increasingly used to address health-related queries, it is crucial that they support their conclusions with credible references. While models can cite sources, the extent to which these support claims remains unclear. To address this gap, we introduce SourceCheckup, an automated agent-based pipeline that evaluates the relevance and supportiveness of sources in LLM responses. We evaluate seven popular LLMs on a dataset of 800 questions and 58,000 pairs of statements and sources on data that represent common medical queries. Our findings reveal that between 50% and 90% of LLM responses are not fully supported, and sometimes contradicted, by the sources they cite. Even for GPT-4o with Web Search, approximately 30% of individual statements are unsupported, and nearly half of its responses are not fully supported. Independent assessments by doctors further validate these results. Our research underscores significant limitations in current LLMs to produce trustworthy medical references.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-58551-6 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-58551-6

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-58551-6

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().