Benchmarking large-language-model vision capabilities in oral and maxillofacial anatomy: A cross-sectional study
Viet Anh Nguyen,
Thi Quynh Trang Vuong and
Nguyen Van Hung
PLOS ONE, 2025, vol. 20, issue 10, 1-13
Abstract:
Background: Multimodal large-language models (LLMs) have recently gained the ability to interpret images. However, their accuracy on anatomy tasks remains unclear. Methods: A cross-sectional, atlas-based benchmark study was conducted in which six publicly accessible chat endpoints, including paired “deep-reasoning” and “low-latency” modes from OpenAI, Microsoft Copilot, and Google Gemini, identified 260 numbered landmarks on 26 high-resolution plates from a classical anatomic atlas. Each image was processed twice per model. Two blinded anatomy lecturers scored responses, including accuracy, run-to-run consistency, and per-label latency, which were compared with χ² and Kruskal–Wallis tests. Results: Overall accuracy differed significantly among models (χ² = 73.2, P
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335775 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 35775&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0335775
DOI: 10.1371/journal.pone.0335775
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().