Performance benchmarking of LLMs on Chinese national medical licensing education: Cross-lingual and question-type effects

Tang, Yuxia; Chen, Jian; Wang, Shouju

Performance benchmarking of LLMs on Chinese national medical licensing education: Cross-lingual and question-type effects

Yuxia Tang, Jian Chen and Shouju Wang

PLOS ONE, 2026, vol. 21, issue 4, 1-8

Abstract: Background: The cross-lingual and question-type variations affecting large language models (LLMs) accuracy on the Chinese national medical licensing educations remain insufficiently explored. Methods: In this cross-sectional study (May 13–20, 2025), 396 educational questions (198 English–Chinese pairs) were extracted from the Chinese national medical licensing examination. ChatGPT-4o, ChatGPT-o3, Gemini-2.5-pro, Deepseek-V3, Deepseek-R1, and Doubao-1.5-pro were prompted to provide answers. Responses were compared against reference answers, and accuracy was computed for three question types: basic knowledge (Type A), case analysis (Type B), and integrative judgment (Type C). Results: Across all question types and languages, Doubao-1.5-pro achieved the highest accuracy at 92.0% ± 1.3%, whereas ChatGPT-4o had the lowest accuracy at 82.8% ± 3.7%. There was a significant main effect of question type (P = 0.0038) but no main effect of language (P = 0.56). Post hoc tests confirmed that Type A performance exceeded Types B and C (P

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0346518 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 46518&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0346518

DOI: 10.1371/journal.pone.0346518

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().