EconPapers    
Economics at your fingertips  
 

Semi-automated assembly of high-quality diploid human reference genomes

Erich D. Jarvis (), Giulio Formenti (), Arang Rhie, Andrea Guarracino, Chentao Yang, Jonathan Wood, Alan Tracey, Francoise Thibaud-Nissen, Mitchell R. Vollger, David Porubsky, Haoyu Cheng, Mobin Asri, Glennis A. Logsdon, Paolo Carnevali, Mark J. P. Chaisson, Chen-Shan Chin, Sarah Cody, Joanna Collins, Peter Ebert, Merly Escalona, Olivier Fedrigo, Robert S. Fulton, Lucinda L. Fulton, Shilpa Garg, Jennifer L. Gerton, Jay Ghurye, Anastasiya Granat, Richard E. Green, William Harvey, Patrick Hasenfeld, Alex Hastie, Marina Haukness, Erich B. Jaeger, Miten Jain, Melanie Kirsche, Mikhail Kolmogorov, Jan O. Korbel, Sergey Koren, Jonas Korlach, Joyce Lee, Daofeng Li, Tina Lindsay, Julian Lucas, Feng Luo, Tobias Marschall, Matthew W. Mitchell, Jennifer McDaniel, Fan Nie, Hugh E. Olsen, Nathan D. Olson, Trevor Pesout, Tamara Potapova, Daniela Puiu, Allison Regier, Jue Ruan, Steven L. Salzberg, Ashley D. Sanders, Michael C. Schatz, Anthony Schmitt, Valerie A. Schneider, Siddarth Selvaraj, Kishwar Shafin, Alaina Shumate, Nathan O. Stitziel, Catherine Stober, James Torrance, Justin Wagner, Jianxin Wang, Aaron Wenger, Chuanle Xiao, Aleksey V. Zimin, Guojie Zhang, Ting Wang, Heng Li, Erik Garrison, David Haussler, Ira Hall, Justin M. Zook, Evan E. Eichler, Adam M. Phillippy, Benedict Paten, Kerstin Howe () and Karen H. Miga ()
Additional contact information
Erich D. Jarvis: The Rockefeller University
Giulio Formenti: The Rockefeller University
Arang Rhie: National Institutes of Health
Andrea Guarracino: Viale Rita Levi-Montalcini
Chentao Yang: BGI-Shenzhen
Jonathan Wood: Wellcome Sanger Institute
Alan Tracey: Wellcome Sanger Institute
Francoise Thibaud-Nissen: National Institutes of Health
Mitchell R. Vollger: University of Washington School of Medicine
David Porubsky: University of Washington School of Medicine
Haoyu Cheng: Dana-Farber Cancer Institute
Mobin Asri: University of California
Glennis A. Logsdon: University of Washington School of Medicine
Paolo Carnevali: Chan Zuckerberg Initiative
Mark J. P. Chaisson: University of Southern California
Chen-Shan Chin: Foundation for Biological Data Science
Sarah Cody: Washington University School of Medicine
Joanna Collins: Wellcome Sanger Institute
Peter Ebert: Heinrich Heine University
Merly Escalona: University of California Santa Cruz
Olivier Fedrigo: The Rockefeller University
Robert S. Fulton: Washington University School of Medicine
Lucinda L. Fulton: Washington University School of Medicine
Shilpa Garg: University of Copenhagen
Jennifer L. Gerton: Stowers Institute for Medical Research
Jay Ghurye: Dovetail Genomics
Anastasiya Granat: Illumina, Inc.
Richard E. Green: University of California
William Harvey: University of Washington School of Medicine
Patrick Hasenfeld: Genome Biology Unit
Alex Hastie: Bionano Genomics
Marina Haukness: University of California
Erich B. Jaeger: Illumina, Inc.
Miten Jain: University of California
Melanie Kirsche: Johns Hopkins University
Mikhail Kolmogorov: University of California San Diego
Jan O. Korbel: Genome Biology Unit
Sergey Koren: National Institutes of Health
Jonas Korlach: Pacific Biosciences
Joyce Lee: Bionano Genomics
Daofeng Li: Washington University School of Medicine
Tina Lindsay: Washington University School of Medicine
Julian Lucas: University of California
Feng Luo: Clemson University
Tobias Marschall: Heinrich Heine University
Matthew W. Mitchell: Coriell Institute for Medical Research
Jennifer McDaniel: National Institute of Standards and Technology
Fan Nie: Central South University
Hugh E. Olsen: University of California
Nathan D. Olson: National Institute of Standards and Technology
Trevor Pesout: University of California
Tamara Potapova: Stowers Institute for Medical Research
Daniela Puiu: Johns Hopkins University
Allison Regier: DNAnexus
Jue Ruan: Chinese Academy of Agricultural Sciences
Steven L. Salzberg: Johns Hopkins University
Ashley D. Sanders: Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC)
Michael C. Schatz: Johns Hopkins University
Anthony Schmitt: Arima Genomics
Valerie A. Schneider: National Institutes of Health
Siddarth Selvaraj: Arima Genomics
Kishwar Shafin: University of California
Alaina Shumate: Johns Hopkins University
Nathan O. Stitziel: Washington University School of Medicine
Catherine Stober: Genome Biology Unit
James Torrance: Wellcome Sanger Institute
Justin Wagner: National Institute of Standards and Technology
Jianxin Wang: Central South University
Aaron Wenger: Pacific Biosciences
Chuanle Xiao: Sun Yat-sen University
Aleksey V. Zimin: Johns Hopkins University
Guojie Zhang: Zhejiang University School of Medicine
Ting Wang: Washington University School of Medicine
Heng Li: Dana-Farber Cancer Institute
Erik Garrison: University of Tennessee Health Science Center
David Haussler: Howard Hughes Medical Institute
Ira Hall: Yale School of Medicine
Justin M. Zook: National Institute of Standards and Technology
Evan E. Eichler: Howard Hughes Medical Institute
Adam M. Phillippy: National Institutes of Health
Benedict Paten: University of California
Kerstin Howe: Wellcome Sanger Institute
Karen H. Miga: University of California

Nature, 2022, vol. 611, issue 7936, 519-531

Abstract: Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Date: 2022
References: View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.nature.com/articles/s41586-022-05325-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:611:y:2022:i:7936:d:10.1038_s41586-022-05325-5

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-022-05325-5

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:nature:v:611:y:2022:i:7936:d:10.1038_s41586-022-05325-5