Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

William Poole, Kalle Leinonen, Ilya Shmulevich, Theo A Knijnenburg and Brady Bernard

PLOS Computational Biology, 2017, vol. 13, issue 2, 1-26

Abstract: Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.Author summary: Identifying driver mutations in cancer has been a major challenge in cancer research, with the ultimate goal of understanding the detailed molecular origins of cancer and providing genetically personalized treatments. For decades, the cancer research community has known that mutations in certain genes—such as tumor suppressors like P53—can drive cancer. In some cases it is also clear that mutations within cancer genes are localized in a single amino—such as the V600E mutation in BRAF. With the existence of large multi-omic data sets including The Cancer Genome Atlas (TCGA), it is now possible to apply big data approaches towards both identifying mutation features of interest and understanding their functional consequences. We have bridged the gap between single amino acid mutations and the whole gene view by developing an algorithm that can identify variable length regions within cancer genes that which enriched for mutations. Furthermore, we have been able to integrate our multiscale mutation clusters with additional molecular data to gain insight into possible functional consequences of the clusters.

Date: 2017
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005347 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05347&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005347

DOI: 10.1371/journal.pcbi.1005347

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().