EconPapers    
Economics at your fingertips  
 

Disease variant prediction with deep generative models of evolutionary data

Jonathan Frazer, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K. Min, Kelly Brock, Yarin Gal () and Debora S. Marks ()
Additional contact information
Jonathan Frazer: Harvard Medical School
Pascal Notin: University of Oxford
Mafalda Dias: Harvard Medical School
Aidan Gomez: University of Oxford
Joseph K. Min: Harvard Medical School
Kelly Brock: Harvard Medical School
Yarin Gal: University of Oxford
Debora S. Marks: Harvard Medical School

Nature, 2021, vol. 599, issue 7883, 91-95

Abstract: Abstract Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences1–3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods4–10 have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable11. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification12–16. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (19)

Downloads: (external link)
https://www.nature.com/articles/s41586-021-04043-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:599:y:2021:i:7883:d:10.1038_s41586-021-04043-8

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-021-04043-8

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:nature:v:599:y:2021:i:7883:d:10.1038_s41586-021-04043-8