Evaluating the effectiveness of large language models in medicine education: a comparison of current medicine knowledge
Md. Mahadi Hassan and
Noushin Nohor
International Journal of Complexity in Applied Science and Technology, 2025, vol. 1, issue 4, 382-396
Abstract:
Recent advancements in artificial intelligence have led to the development of powerful large language models (LLMs) like ChatGPT-4-turbo, Gemini 2.0 Flash, DeepSeek-R1, and Qwen2.5-Max. This study evaluates their medical knowledge proficiency using multiple-choice questions (MCQs) sourced from a reputable medical textbook, with answers verified by experts. Each model was tested on its ability to select correct answers, and performance was analysed using ANOVA and Tukey's HSD tests. Results showed that while all models exhibited some proficiency, ChatGPT-4-turbo significantly outperformed Gemini 2.0 Flash and Qwen2.5-Max, with no notable difference between ChatGPT-4-turbo and DeepSeek-R1. Despite their capabilities, these models remain unreliable for medical education and assistance. Enhancing their accuracy and reliability is crucial for their effective application in healthcare, enabling medical students and professionals to utilise AI for learning and clinical decision-making. Further development is needed to improve their utility in medical practice.
Keywords: large language models; LLMs; artificial intelligence; ChatGPT; Gemini; DeepSeek; Qwen. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=147091 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijcast:v:1:y:2025:i:4:p:382-396
Access Statistics for this article
More articles in International Journal of Complexity in Applied Science and Technology from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().