Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features
Luan Nguyen,
Arne Hoeck and
Edwin Cuppen ()
Additional contact information
Luan Nguyen: Universiteitsweg 100
Arne Hoeck: Universiteitsweg 100
Edwin Cuppen: Universiteitsweg 100
Nature Communications, 2022, vol. 13, issue 1, 1-12
Abstract:
Abstract Cancers of unknown primary (CUP) origin account for ∼3% of all cancer diagnoses, whereby the tumor tissue of origin (TOO) cannot be determined. Using a uniformly processed dataset encompassing 6756 whole-genome sequenced primary and metastatic tumors, we develop Cancer of Unknown Primary Location Resolver (CUPLR), a random forest TOO classifier that employs 511 features based on simple and complex somatic driver and passenger mutations. CUPLR distinguishes 35 cancer (sub)types with ∼90% recall and ∼90% precision based on cross-validation and test set predictions. We find that structural variant derived features increase the performance and utility for classifying specific cancer types. With CUPLR, we could determine the TOO for 82/141 (58%) of CUP patients. Although CUPLR is based on machine learning, it provides a human interpretable graphical report with detailed feature explanations. The comprehensive output of CUPLR complements existing histopathological procedures and can enable improved diagnostics for CUP patients.
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41467-022-31666-w Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31666-w
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-022-31666-w
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().