Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity

Jia, Qiufeng; Wen, Yuhang; Liu, Yuyan; Zhao, Hui; Yu, Qiongge; Long, Yu; Sun, Dan; Yu, Yufeng

Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity

Qiufeng Jia, Yuhang Wen, Yuyan Liu, Hui Zhao, Qiongge Yu, Yu Long, Dan Sun and Yufeng Yu

PLOS ONE, 2026, vol. 21, issue 5, 1-18

Abstract: Background: Large language models are increasingly integrated into healthcare for clinical decision support and patient communication. Although these models can pass explicit social bias tests, they may retain implicit biases—latent associations between social groups and attributes—that could influence medical judgment. Objective: To systematically evaluate the presence, magnitude, and behavioral impact of implicit biases in large language models within the medical domain across six high-stakes categories: gender, race, socioeconomic status, health conditions, religion, and healthcare systems. Design: A descriptive cross-sectional study using a multi-faceted evaluation framework. Setting(s): Computational analysis of 10 mainstream global large language models, including proprietary models (ChatGPT-4o, Gemini-2.0-Flash) and open-source models (DeepSeek-V3, Qwen3). Methods: We constructed 24 medical bias datasets across six categories. Bias was assessed using three methods: (1) the Large Language Model Word Association Test, a prompt-based method for revealing implicit biases; (2) the Large Language Model Relative Decision Test, a strategy for detecting subtle discrimination in situational decision-making; (3) Paired-Prompt Analysis, used to examine whether implicit associations predict discriminatory decisions. Results: All 10 models exhibited systematic implicit biases (Mean IAT Bias > 0) across all categories, with the strongest biases observed in Race (Mean = 0.61) and Socioeconomic Status (Mean = 0.56). Advanced reasoning capabilities (Chain-of-Thought) did not significantly reduce bias magnitude. Crucially, stronger implicit associations significantly predicted discriminatory choices in downstream medical decision tasks (p

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0348819 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 48819&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0348819

DOI: 10.1371/journal.pone.0348819

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().