EconPapers    
Economics at your fingertips  
 

Automated real-world data integration improves cancer outcome prediction

Justin Jee, Christopher Fong, Karl Pichotta, Thinh Ngoc Tran, Anisha Luthra, Michele Waters, Chenlian Fu, Mirella Altoe, Si-Yang Liu, Steven B. Maron, Mehnaj Ahmed, Susie Kim, Mono Pirun, Walid K. Chatila, Ino Bruijn, Arfath Pasha, Ritika Kundra, Benjamin Gross, Brooke Mastrogiacomo, Tyler J. Aprati, David Liu, JianJiong Gao, Marzia Capelletti, Kelly Pekala, Lisa Loudon, Maria Perry, Chaitanya Bandlamudi, Mark Donoghue, Baby Anusha Satravada, Axel Martin, Ronglai Shen, Yuan Chen, A. Rose Brannon, Jason Chang, Lior Braunstein, Anyi Li, Anton Safonov, Aaron Stonestrom, Pablo Sanchez-Vela, Clare Wilhelm, Mark Robson, Howard Scher, Marc Ladanyi, Jorge S. Reis-Filho, David B. Solit, David R. Jones, Daniel Gomez, Helena Yu, Debyani Chakravarty, Rona Yaeger, Wassim Abida, Wungki Park, Eileen M. O’Reilly, Julio Garcia-Aguilar, Nicholas Socci, Francisco Sanchez-Vega, Jian Carrot-Zhang, Peter D. Stetson, Ross Levine, Charles M. Rudin, Michael F. Berger, Sohrab P. Shah, Deborah Schrag, Pedram Razavi, Kenneth L. Kehl, Bob T. Li, Gregory J. Riely and Nikolaus Schultz ()
Additional contact information
Justin Jee: Memorial Sloan Kettering Cancer Center
Christopher Fong: Memorial Sloan Kettering Cancer Center
Karl Pichotta: Memorial Sloan Kettering Cancer Center
Thinh Ngoc Tran: Memorial Sloan Kettering Cancer Center
Anisha Luthra: Memorial Sloan Kettering Cancer Center
Michele Waters: Memorial Sloan Kettering Cancer Center
Chenlian Fu: Memorial Sloan Kettering Cancer Center
Mirella Altoe: Memorial Sloan Kettering Cancer Center
Si-Yang Liu: Memorial Sloan Kettering Cancer Center
Steven B. Maron: Memorial Sloan Kettering Cancer Center
Mehnaj Ahmed: Memorial Sloan Kettering Cancer Center
Susie Kim: Memorial Sloan Kettering Cancer Center
Mono Pirun: Memorial Sloan Kettering Cancer Center
Walid K. Chatila: Memorial Sloan Kettering Cancer Center
Ino Bruijn: Memorial Sloan Kettering Cancer Center
Arfath Pasha: Memorial Sloan Kettering Cancer Center
Ritika Kundra: Memorial Sloan Kettering Cancer Center
Benjamin Gross: Memorial Sloan Kettering Cancer Center
Brooke Mastrogiacomo: Memorial Sloan Kettering Cancer Center
Tyler J. Aprati: Dana Farber Cancer Institute
David Liu: Dana Farber Cancer Institute
JianJiong Gao: Caris Life Sciences
Marzia Capelletti: Caris Life Sciences
Kelly Pekala: Memorial Sloan Kettering Cancer Center
Lisa Loudon: Memorial Sloan Kettering Cancer Center
Maria Perry: Memorial Sloan Kettering Cancer Center
Chaitanya Bandlamudi: Memorial Sloan Kettering Cancer Center
Mark Donoghue: Memorial Sloan Kettering Cancer Center
Baby Anusha Satravada: Memorial Sloan Kettering Cancer Center
Axel Martin: Memorial Sloan Kettering Cancer Center
Ronglai Shen: Memorial Sloan Kettering Cancer Center
Yuan Chen: Memorial Sloan Kettering Cancer Center
A. Rose Brannon: Memorial Sloan Kettering Cancer Center
Jason Chang: Memorial Sloan Kettering Cancer Center
Lior Braunstein: Memorial Sloan Kettering Cancer Center
Anyi Li: Memorial Sloan Kettering Cancer Center
Anton Safonov: Memorial Sloan Kettering Cancer Center
Aaron Stonestrom: Memorial Sloan Kettering Cancer Center
Pablo Sanchez-Vela: Memorial Sloan Kettering Cancer Center
Clare Wilhelm: Memorial Sloan Kettering Cancer Center
Mark Robson: Memorial Sloan Kettering Cancer Center
Howard Scher: Memorial Sloan Kettering Cancer Center
Marc Ladanyi: Memorial Sloan Kettering Cancer Center
Jorge S. Reis-Filho: Memorial Sloan Kettering Cancer Center
David B. Solit: Memorial Sloan Kettering Cancer Center
David R. Jones: Memorial Sloan Kettering Cancer Center
Daniel Gomez: Memorial Sloan Kettering Cancer Center
Helena Yu: Memorial Sloan Kettering Cancer Center
Debyani Chakravarty: Memorial Sloan Kettering Cancer Center
Rona Yaeger: Memorial Sloan Kettering Cancer Center
Wassim Abida: Memorial Sloan Kettering Cancer Center
Wungki Park: Memorial Sloan Kettering Cancer Center
Eileen M. O’Reilly: Memorial Sloan Kettering Cancer Center
Julio Garcia-Aguilar: Memorial Sloan Kettering Cancer Center
Nicholas Socci: Memorial Sloan Kettering Cancer Center
Francisco Sanchez-Vega: Memorial Sloan Kettering Cancer Center
Jian Carrot-Zhang: Memorial Sloan Kettering Cancer Center
Peter D. Stetson: Memorial Sloan Kettering Cancer Center
Ross Levine: Memorial Sloan Kettering Cancer Center
Charles M. Rudin: Memorial Sloan Kettering Cancer Center
Michael F. Berger: Memorial Sloan Kettering Cancer Center
Sohrab P. Shah: Memorial Sloan Kettering Cancer Center
Deborah Schrag: Memorial Sloan Kettering Cancer Center
Pedram Razavi: Memorial Sloan Kettering Cancer Center
Kenneth L. Kehl: Dana Farber Cancer Institute
Bob T. Li: Memorial Sloan Kettering Cancer Center
Gregory J. Riely: Memorial Sloan Kettering Cancer Center
Nikolaus Schultz: Memorial Sloan Kettering Cancer Center

Nature, 2024, vol. 636, issue 8043, 728-736

Abstract: Abstract The digitization of health records and growing availability of tumour DNA sequencing provide an opportunity to study the determinants of cancer outcomes with unprecedented richness. Patient data are often stored in unstructured text and siloed datasets. Here we combine natural language processing annotations1,2 with structured medication, patient-reported demographic, tumour registry and tumour genomic data from 24,950 patients at Memorial Sloan Kettering Cancer Center to generate a clinicogenomic, harmonized oncologic real-world dataset (MSK-CHORD). MSK-CHORD includes data for non-small-cell lung (n = 7,809), breast (n = 5,368), colorectal (n = 5,543), prostate (n = 3,211) and pancreatic (n = 3,109) cancers and enables discovery of clinicogenomic relationships not apparent in smaller datasets. Leveraging MSK-CHORD to train machine learning models to predict overall survival, we find that models including features derived from natural language processing, such as sites of disease, outperform those based on genomic data or stage alone as tested by cross-validation and an external, multi-institution dataset. By annotating 705,241 radiology reports, MSK-CHORD also uncovers predictors of metastasis to specific organ sites, including a relationship between SETD2 mutation and lower metastatic potential in immunotherapy-treated lung adenocarcinoma corroborated in independent datasets. We demonstrate the feasibility of automated annotation from unstructured notes and its utility in predicting patient outcomes. The resulting data are provided as a public resource for real-world oncologic research.

Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41586-024-08167-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:636:y:2024:i:8043:d:10.1038_s41586-024-08167-5

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-024-08167-5

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:nature:v:636:y:2024:i:8043:d:10.1038_s41586-024-08167-5