A comparative analysis of the performance of leading large language models on the endodontics section of the dentistry specialization exam in Türkiye
Beyhan Başkan,
Hatice Kübra Başkan and
Nevzat Koç
PLOS ONE, 2026, vol. 21, issue 6, 1-13
Abstract:
Objective: This study aimed to evaluate and compare the performance of eight contemporary LLMs on the endodontics section of the DUS, assessing their accuracy in both theoretical knowledge and simulated clinical scenarios from historical exam data. Methods: The performance of eight different large language models (Claude 4, DeepSeek V3, Gemini 2.5 Pro, ChatGPT-4o, ChatGPT-5, Grok 4, LLaMA 4, and Perplexity) was evaluated using 127 multiple-choice endodontics questions from the Specialization Exam in Dentistry (DUS) administered by the Student Selection and Placement Center (ÖSYM) between 2012 and 2021. The models’ responses were compared against the official answer keys. Statistical analyses were performed using Pearson’s chi-square and McNemar tests, with a significance level of α = 0.05. Results: Significant differences existed among LLMs in overall accuracy (p 0.05). Conclusion: Contemporary LLMs demonstrate substantial competence in endodontic knowledge, with Gemini 2.5 Pro excelling in both theoretical and clinical queries. However, significant performance variability across models (61.4%−90.6%) and the complexity of retrieving and resolving clinical exam queries necessitate domain-specific optimization and expert oversight for reliable integration into dental education and practice.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0350457 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 50457&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0350457
DOI: 10.1371/journal.pone.0350457
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().