EconPapers    
Economics at your fingertips  
 

A whole-slide foundation model for digital pathology from real-world data

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, Yanbo Xu, Mu Wei, Wenhui Wang, Shuming Ma, Furu Wei, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Jaylen Rosemon, Tucker Bower, Soohee Lee, Roshanthi Weerasinghe, Bill J. Wright, Ari Robicsek, Brian Piening, Carlo Bifulco (), Sheng Wang () and Hoifung Poon ()
Additional contact information
Hanwen Xu: Microsoft Research
Naoto Usuyama: Microsoft Research
Jaspreet Bagga: Microsoft Research
Sheng Zhang: Microsoft Research
Rajesh Rao: Microsoft Research
Tristan Naumann: Microsoft Research
Cliff Wong: Microsoft Research
Zelalem Gero: Microsoft Research
Javier González: Microsoft Research
Yu Gu: Microsoft Research
Yanbo Xu: Microsoft Research
Mu Wei: Microsoft Research
Wenhui Wang: Microsoft Research
Shuming Ma: Microsoft Research
Furu Wei: Microsoft Research
Jianwei Yang: Microsoft Research
Chunyuan Li: Microsoft Research
Jianfeng Gao: Microsoft Research
Jaylen Rosemon: Providence Genomics
Tucker Bower: Providence Genomics
Soohee Lee: Providence Research Network
Roshanthi Weerasinghe: Providence Research Network
Bill J. Wright: Providence Research Network
Ari Robicsek: Providence Research Network
Brian Piening: Providence Genomics
Carlo Bifulco: Providence Genomics
Sheng Wang: University of Washington
Hoifung Poon: Microsoft Research

Nature, 2024, vol. 630, issue 8015, 181-188

Abstract: Abstract Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1–3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision–language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.

Date: 2024
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41586-024-07441-w Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:630:y:2024:i:8015:d:10.1038_s41586-024-07441-w

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-024-07441-w

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:nature:v:630:y:2024:i:8015:d:10.1038_s41586-024-07441-w