EconPapers    
Economics at your fingertips  
 

Benchmarking large-language-model vision capabilities in oral and maxillofacial anatomy: A cross-sectional study

Viet Anh Nguyen, Thi Quynh Trang Vuong and Nguyen Van Hung

PLOS ONE, 2025, vol. 20, issue 10, 1-13

Abstract: Background: Multimodal large-language models (LLMs) have recently gained the ability to interpret images. However, their accuracy on anatomy tasks remains unclear. Methods: A cross-sectional, atlas-based benchmark study was conducted in which six publicly accessible chat endpoints, including paired “deep-reasoning” and “low-latency” modes from OpenAI, Microsoft Copilot, and Google Gemini, identified 260 numbered landmarks on 26 high-resolution plates from a classical anatomic atlas. Each image was processed twice per model. Two blinded anatomy lecturers scored responses, including accuracy, run-to-run consistency, and per-label latency, which were compared with χ² and Kruskal–Wallis tests. Results: Overall accuracy differed significantly among models (χ² = 73.2, P

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335775 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 35775&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0335775

DOI: 10.1371/journal.pone.0335775

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-11-29
Handle: RePEc:plo:pone00:0335775