EconPapers    
Economics at your fingertips  
 

A deep catalogue of protein-coding variation in 983,578 individuals

Kathie Y. Sun, Xiaodong Bai, Siying Chen, Suying Bao, Chuanyi Zhang, Manav Kapoor, Joshua Backman, Tyler Joseph, Evan Maxwell, George Mitra, Alexander Gorovits, Adam Mansfield, Boris Boutkov, Sujit Gokhale, Lukas Habegger, Anthony Marcketta, Adam E. Locke, Liron Ganel, Alicia Hawes, Michael D. Kessler, Deepika Sharma, Jeffrey Staples, Jonas Bovijn, Sahar Gelfman, Alessandro Gioia, Veera M. Rajagopal, Alexander Lopez, Jennifer Rico Varela, Jesús Alegre-Díaz, Jaime Berumen, Roberto Tapia-Conyer, Pablo Kuri-Morales, Jason Torres, Jonathan Emberson, Rory Collins, Michael Cantor, Timothy Thornton, Hyun Min Kang, John D. Overton, Alan R. Shuldiner, M. Laura Cremona, Mona Nafde, Aris Baras, Gonçalo Abecasis, Jonathan Marchini, Jeffrey G. Reid, William Salerno () and Suganthi Balasubramanian ()
Additional contact information
Kathie Y. Sun: Regeneron Genetics Center
Xiaodong Bai: Regeneron Genetics Center
Siying Chen: Regeneron Genetics Center
Suying Bao: Regeneron Genetics Center
Chuanyi Zhang: Regeneron Genetics Center
Manav Kapoor: Regeneron Genetics Center
Joshua Backman: Regeneron Genetics Center
Tyler Joseph: Regeneron Genetics Center
Evan Maxwell: Regeneron Genetics Center
George Mitra: Regeneron Genetics Center
Alexander Gorovits: Regeneron Genetics Center
Adam Mansfield: Regeneron Genetics Center
Boris Boutkov: Regeneron Genetics Center
Sujit Gokhale: Regeneron Genetics Center
Lukas Habegger: Regeneron Genetics Center
Anthony Marcketta: Regeneron Genetics Center
Adam E. Locke: Regeneron Genetics Center
Liron Ganel: Regeneron Genetics Center
Alicia Hawes: Regeneron Genetics Center
Michael D. Kessler: Regeneron Genetics Center
Deepika Sharma: Regeneron Genetics Center
Jeffrey Staples: Regeneron Genetics Center
Jonas Bovijn: Regeneron Genetics Center
Sahar Gelfman: Regeneron Genetics Center
Alessandro Gioia: Regeneron Genetics Center
Veera M. Rajagopal: Regeneron Genetics Center
Alexander Lopez: Regeneron Genetics Center
Jennifer Rico Varela: Regeneron Genetics Center
Jesús Alegre-Díaz: National Autonomous University of Mexico (UNAM)
Jaime Berumen: National Autonomous University of Mexico (UNAM)
Roberto Tapia-Conyer: National Autonomous University of Mexico (UNAM)
Pablo Kuri-Morales: National Autonomous University of Mexico (UNAM)
Jason Torres: University of Oxford
Jonathan Emberson: University of Oxford
Rory Collins: University of Oxford
Michael Cantor: Regeneron Genetics Center
Timothy Thornton: Regeneron Genetics Center
Hyun Min Kang: Regeneron Genetics Center
John D. Overton: Regeneron Genetics Center
Alan R. Shuldiner: Regeneron Genetics Center
M. Laura Cremona: Regeneron Genetics Center
Mona Nafde: Regeneron Genetics Center
Aris Baras: Regeneron Genetics Center
Gonçalo Abecasis: Regeneron Genetics Center
Jonathan Marchini: Regeneron Genetics Center
Jeffrey G. Reid: Regeneron Genetics Center
William Salerno: Regeneron Genetics Center
Suganthi Balasubramanian: Regeneron Genetics Center

Nature, 2024, vol. 631, issue 8021, 583-592

Abstract: Abstract Rare coding variants that substantially affect function provide insights into the biology of a gene1–3. However, ascertaining the frequency of such variants requires large sample sizes4–8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.

Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41586-024-07556-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:631:y:2024:i:8021:d:10.1038_s41586-024-07556-0

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-024-07556-0

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:nature:v:631:y:2024:i:8021:d:10.1038_s41586-024-07556-0