Transfer learning enables predictions in network biology

Theodoris, Christina V.; Xiao, Ling; Chopra, Anant; Chaffin, Mark D.; Sayed, Zeina R. Al; Hill, Matthew C.; Mantineo, Helene; Brydon, Elizabeth M.; Zeng, Zexian; Liu, X. Shirley; Ellinor, Patrick T.

Transfer learning enables predictions in network biology

Christina V. Theodoris (), Ling Xiao, Anant Chopra, Mark D. Chaffin, Zeina R. Al Sayed, Matthew C. Hill, Helene Mantineo, Elizabeth M. Brydon, Zexian Zeng, X. Shirley Liu and Patrick T. Ellinor ()
Additional contact information
Christina V. Theodoris: Dana-Farber Cancer Institute
Ling Xiao: Broad Institute of MIT and Harvard
Anant Chopra: Bayer US LLC
Mark D. Chaffin: Broad Institute of MIT and Harvard
Zeina R. Al Sayed: Broad Institute of MIT and Harvard
Matthew C. Hill: Broad Institute of MIT and Harvard
Helene Mantineo: Broad Institute of MIT and Harvard
Elizabeth M. Brydon: Bayer US LLC
Zexian Zeng: Dana-Farber Cancer Institute
X. Shirley Liu: Dana-Farber Cancer Institute
Patrick T. Ellinor: Broad Institute of MIT and Harvard

Nature, 2023, vol. 618, issue 7965, 616-624

Abstract: Abstract Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.

Date: 2023
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.nature.com/articles/s41586-023-06139-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:618:y:2023:i:7965:d:10.1038_s41586-023-06139-9

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-023-06139-9

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().