CustOmics: A versatile deep-learning based strategy for multi-omics integration

Benkirane, Hakim; Pradat, Yoann; Michiels, Stefan; Cournède, Paul-Henry

CustOmics: A versatile deep-learning based strategy for multi-omics integration

Hakim Benkirane, Yoann Pradat, Stefan Michiels and Paul-Henry Cournède

PLOS Computational Biology, 2023, vol. 19, issue 3, 1-19

Abstract: The availability of patient cohorts with several types of omics data opens new perspectives for exploring the disease’s underlying biological processes and developing predictive models. It also comes with new challenges in computational biology in terms of integrating high-dimensional and heterogeneous data in a fashion that captures the interrelationships between multiple genes and their functions. Deep learning methods offer promising perspectives for integrating multi-omics data. In this paper, we review the existing integration strategies based on autoencoders and propose a new customizable one whose principle relies on a two-phase approach. In the first phase, we adapt the training to each data source independently before learning cross-modality interactions in the second phase. By taking into account each source’s singularity, we show that this approach succeeds at taking advantage of all the sources more efficiently than other strategies. Moreover, by adapting our architecture to the computation of Shapley additive explanations, our model can provide interpretable results in a multi-source setting. Using multiple omics sources from different TCGA cohorts, we demonstrate the performance of the proposed method for cancer on test cases for several tasks, such as the classification of tumor types and breast cancer subtypes, as well as survival outcome prediction. We show through our experiments the great performances of our architecture on seven different datasets with various sizes and provide some interpretations of the results obtained. Our code is available on (https://github.com/HakimBenkirane/CustOmics).Author summary: Cancer is a complex disease involving multiple genetic and environmental factors. Those factors affect biological systems on many levels. To better characterize a patient’s molecular profile, we need to rely on multiple dimensions simultaneously, for example, genomics, transcriptomics, and epigenomics data. However, those data types are very different, making their integration challenging because of the high heterogeneity between the sources. Moreover, while defining a model architecture that can take any type of input source, we need to tackle the issue of the generalizability of the integration, as different combinations of omic sources can behave differently due to discrepancies in data types and dimensionality. In light of those challenges, we developed a new integration strategy and framework called CustOmics to help scientists integrate multiple omics data. Our results show that this new integration method outperforms the state-of-the-art deep learning methods for multi-omic integration in classification and survival tasks.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010921 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10921&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1010921

DOI: 10.1371/journal.pcbi.1010921

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().