Retrieval-Augmented Language Models for Clinical Decision Support in the Classification of Inborn Errors of Immunity

Arzehgar, Afrooz; Varasteh Yazdi, Saeed; Ahanchian, Hamid; Eslami, Saeid

Retrieval-Augmented Language Models for Clinical Decision Support in the Classification of Inborn Errors of Immunity

Afrooz Arzehgar, Saeed Varasteh Yazdi (), Hamid Ahanchian and Saeid Eslami
Additional contact information
Afrooz Arzehgar: Mashhad University of Medical Sciences (Iran, Mashhad) - MUMS
Saeed Varasteh Yazdi: EM - EMLyon Business School
Hamid Ahanchian: Mashhad University of Medical Sciences (Iran, Mashhad) - MUMS
Saeid Eslami: UvA - Universiteit van Amsterdam = University of Amsterdam

Post-Print from HAL

Abstract: Early diagnosis of inborn errors of immunity (IEIs) can make a difference in patient outcomes and even cut healthcare costs. However, there are some challenges to overcome, such as clinical complexity, low awareness, and limited resources. Generative artificial intelligence has attracted considerable global attention in medical domains, particularly when integrated into clinical decision support systems (CDSS), as it has the potential to facilitate data interpretation, clinical reasoning, and the optimal use of knowledge resources. Preliminary studies have explored the potential of large language models (LLMs) in various information retrieval tasks, but a systematic evaluation of LLMs with and without retrieval mechanisms for IEI classification is still unexplored. We evaluated and compared the validity and reliability of the responses generated by four open-source and closed-source LLMs, in their baseline form and with augmented data, across 169 IEI patient records, using two input scenarios and four prompt templates. Our primary finding was that the models varied in terms of reliability and performance. The most reliable models were Gemini-1.5-Pro and Llama-3.1-8B-Instruct (K = 0.98) and the best-performing model without data augmentation was Gemini with an F1 score of 43.39 % ± 0.10. The results also showed that retrieval strategies improved the average classification performance, increasing the F1 score from 34 % to 53 % across all models. DeepSeek-R1, which reasoned over retrieved information through the integration of quality refinement and structured retrieval, achieved the best weighted F1 score of 66.94 % 1.19. The study highlights the effective use of generative AI and retrieval-augmented models as a decision support tool for IEI classification. However, incorporating retrieval systems into clinical decision-making processes requires adequate input, effective prompt engineering, and the adoption of retrieval strategies.

Keywords: Inborn errors of immunity; Primary immunodeficiency; Large language models; Retrieval-augmented generation; Clinical decision support (search for similar items in EconPapers)
Date: 2026-06-05
Note: View the original document on HAL open archive server: https://hal.science/hal-05656789v1
References: Add references at CitEc
Citations:

Published in Journal of Clinical Immunology, inPress, ⟨10.1007/s10875-026-02035-9⟩

Downloads: (external link)
https://hal.science/hal-05656789v1/document (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-05656789

DOI: 10.1007/s10875-026-02035-9

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().