Generalizable and automated classification of TNM stage from pathology reports with external validation
Jenna Kefeli,
Jacob Berkowitz,
Jose M. Acitores Cortina,
Kevin K. Tsang and
Nicholas P. Tatonetti ()
Additional contact information
Jenna Kefeli: Columbia University
Jacob Berkowitz: Cedars-Sinai Medical Center
Jose M. Acitores Cortina: Cedars-Sinai Medical Center
Kevin K. Tsang: Cedars-Sinai Medical Center
Nicholas P. Tatonetti: Columbia University
Nature Communications, 2024, vol. 15, issue 1, 1-7
Abstract:
Abstract Cancer staging is an essential clinical attribute informing patient prognosis and clinical trial eligibility. However, it is not routinely recorded in structured electronic health records. Here, we present BB-TEN: Big Bird – TNM staging Extracted from Notes, a generalizable method for the automated classification of TNM stage directly from pathology report text. We train a BERT-based model using publicly available pathology reports across approximately 7000 patients and 23 cancer types. We explore the use of different model types, with differing input sizes, parameters, and model architectures. Our final model goes beyond term-extraction, inferring TNM stage from context when it is not included in the report text explicitly. As external validation, we test our model on almost 8000 pathology reports from Columbia University Medical Center, finding that our trained model achieved an AU-ROC of 0.815–0.942. This suggests that our model can be applied broadly to other institutions without additional institution-specific fine-tuning.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-024-53190-9 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53190-9
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-024-53190-9
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().