EconPapers    
Economics at your fingertips  
 

Diverse ancestral representation improves genetic intolerance metrics

Alexander L. Han, Chloe F. Sands, Dorota Matelska, Jessica C. Butts, Vida Ravanmehr, Fengyuan Hu, Esmeralda Villavicencio Gonzalez, Nicholas Katsanis, Carlos D. Bustamante, Quanli Wang, Slavé Petrovski (), Dimitrios Vitsios and Ryan S. Dhindsa ()
Additional contact information
Alexander L. Han: Baylor College of Medicine
Chloe F. Sands: Baylor College of Medicine
Dorota Matelska: AstraZeneca
Jessica C. Butts: Rice University
Vida Ravanmehr: Baylor College of Medicine
Fengyuan Hu: AstraZeneca
Esmeralda Villavicencio Gonzalez: Texas Children’s Hospital
Nicholas Katsanis: Galatea Bio, Inc
Carlos D. Bustamante: Galatea Bio, Inc
Quanli Wang: AstraZeneca
Slavé Petrovski: AstraZeneca
Dimitrios Vitsios: AstraZeneca
Ryan S. Dhindsa: Baylor College of Medicine

Nature Communications, 2025, vol. 16, issue 1, 1-9

Abstract: Abstract The unprecedented scale of genomic databases has revolutionized our ability to identify regions in the human genome intolerant to variation—regions often implicated in disease. However, these datasets remain constrained by limited ancestral diversity. Here, we analyze whole-exome sequencing data from 460,551 UK Biobank and 125,748 Genome Aggregation Database (gnomAD) participants across multiple ancestries to test several key intolerance metrics, including the Residual Variance Intolerance Score (RVIS), Missense Tolerance Ratio (MTR), and Loss-of-Function Observed/Expected ratio (LOF O/E). We demonstrate that increasing ancestral representation, rather than sample size alone, critically drives their performance. Scores trained on variation observed in African and Admixed American ancestral groups show higher resolution in detecting haploinsufficient and neurodevelopmental disease risk genes compared to scores trained on European ancestry groups. Most strikingly, MTR trained on 43,000 multi-ancestry exomes demonstrates greater predictive power than when trained on a nearly 10-fold larger dataset of 440,000 non-Finnish European exomes. We further find that European ancestry group-based scores are likely approaching saturation. These findings highlight the need for enhanced population representation in genomic resources to fully realize the potential of precision medicine and drug discovery. Ancestry group-specific scores are publicly available through an interactive portal: http://intolerance.public.cgr.astrazeneca.com/ .

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-57885-5 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57885-5

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-57885-5

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-02
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57885-5