In-context learning enables multimodal large language models to classify cancer pathology images

Ferber, Dyke; Wölflein, Georg; Wiest, Isabella C.; Ligero, Marta; Sainath, Srividhya; Laleh, Narmin Ghaffari; Nahhas, Omar S. M. El; Müller-Franzes, Gustav; Jäger, Dirk; Truhn, Daniel; Kather, Jakob Nikolas

In-context learning enables multimodal large language models to classify cancer pathology images

Dyke Ferber, Georg Wölflein, Isabella C. Wiest, Marta Ligero, Srividhya Sainath, Narmin Ghaffari Laleh, Omar S. M. El Nahhas, Gustav Müller-Franzes, Dirk Jäger, Daniel Truhn and Jakob Nikolas Kather ()
Additional contact information
Dyke Ferber: Heidelberg University Hospital
Georg Wölflein: University of St Andrews
Isabella C. Wiest: Technical University Dresden
Marta Ligero: Technical University Dresden
Srividhya Sainath: Technical University Dresden
Narmin Ghaffari Laleh: Technical University Dresden
Omar S. M. El Nahhas: Technical University Dresden
Gustav Müller-Franzes: University Hospital Aachen
Dirk Jäger: Heidelberg University Hospital
Daniel Truhn: University Hospital Aachen
Jakob Nikolas Kather: Heidelberg University Hospital

Nature Communications, 2024, vol. 15, issue 1, 1-12

Abstract: Abstract Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41467-024-51465-9 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-51465-9

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-024-51465-9

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().