Evaluating Diagnostic Performance of Laypersons, Physicians, and AI-Augmented Physicians Across Clinical Complexity Levels

Shamsudeen, Mohamed Arsath; Ahmad, Arqam Mibsaam; Kazi, Faaiza; Kazi, Syed Faazil; Khanday, Ayesha Zaffer; Arif, Shifan

Evaluating Diagnostic Performance of Laypersons, Physicians, and AI-Augmented Physicians Across Clinical Complexity Levels

Mohamed Arsath Shamsudeen (), Arqam Mibsaam Ahmad (), Faaiza Kazi (), Syed Faazil Kazi (), Ayesha Zaffer Khanday () and Shifan Arif ()

International Journal of Innovative Science and Research Technology (IJISRT), 2025, vol. 10, issue 07, 1048-1056

Abstract: ïƒ˜ Background Large language models (LLMs) like ChatGPT are rapidly entering clinical contexts. While these models can generate fluent, guideline-aligned responses and perform well on exams, linguistic fluency does not equal clinical competence. Realworld medicine demands contextual reasoning, risk assessment, and value-sensitive decisionsâ€”skills LLMs lack. The growing public access to LLMs raises safety concerns, particularly when untrained users interpret AI outputs as medical advice. ïƒ˜ Objective This study evaluated whether AIâ€™s clinical value depends on the expertise of its user. We compared three groups: laypersons using ChatGPT, physicians acting independently, and physicians using ChatGPT for decision support. ïƒ˜ Methods In a simulation-based study, 150 participants (50 per group) assessed 15 clinical cases of varying complexity. For each case, participants provided a diagnosis, a next step, and a brief justification. Responses were scored by blinded physicians using standardized rubrics. Analyses included ANOVA, effect size estimation, and content review of reasoning quality. ïƒ˜ Results Diagnostic accuracy was highest among physicians using ChatGPT (94.4%), followed by physicians alone (88.0%) and laypersons with ChatGPT (60.7%). Management quality mirrored this pattern. AI-assisted physicians submitted more comprehensive plans and took more time, suggesting deeper engagement. Laypersons often reproduced AI outputs uncritically, lacking contextual understanding and raising safety risks. ïƒ˜ Conclusion AI does not equalize clinical skillâ€”it magnifies it. When used by trained professionals, ChatGPT enhances diagnostic accuracy and decision quality. In untrained hands, it can lead to error and overconfidence. Integrating LLMs into healthcare demands thoughtful oversight, clinician training, and safeguards to prevent misuse. The most effective path is not AI replacing clinicians, but augmenting themâ€”supporting clinical judgment, not supplanting it.

Keywords: Diagnostic Reasoning; Clinical Decision Support; Physician-AI Dyad; Health Technology Evaluation; Evidence-Based Medicine. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.ijisrt.com/evaluating-diagnostic-perfo ... al-complexity-levels (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:cvr:ijisrt:2025:07:ijisrt25jul620

DOI: 10.38124/ijisrt/25jul620

Access Statistics for this article

More articles in International Journal of Innovative Science and Research Technology (IJISRT) from IJISRT Publication
Bibliographic data for series maintained by Rahul Goyel ().