EconPapers    
Economics at your fingertips  
 

Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C

Zev N. Kronenberg (), Arang Rhie, Sergey Koren, Gregory T. Concepcion, Paul Peluso, Katherine M. Munson, David Porubsky, Kristen Kuhn, Kathryn A. Mueller, Wai Yee Low, Stefan Hiendleder, Olivier Fedrigo, Ivan Liachko, Richard J. Hall, Adam M. Phillippy, Evan E. Eichler, John L. Williams, Timothy P. L. Smith, Erich D. Jarvis, Shawn T. Sullivan and Sarah B. Kingan ()
Additional contact information
Zev N. Kronenberg: Phase Genomics
Arang Rhie: National Human Genome Research Institute
Sergey Koren: National Human Genome Research Institute
Gregory T. Concepcion: Pacific Biosciences
Paul Peluso: Pacific Biosciences
Katherine M. Munson: University of Washington School of Medicine
David Porubsky: University of Washington School of Medicine
Kristen Kuhn: Clay Center
Kathryn A. Mueller: Phase Genomics
Wai Yee Low: The University of Adelaide
Stefan Hiendleder: The University of Adelaide
Olivier Fedrigo: The Rockefeller University
Ivan Liachko: Phase Genomics
Richard J. Hall: Pacific Biosciences
Adam M. Phillippy: National Human Genome Research Institute
Evan E. Eichler: University of Washington School of Medicine
John L. Williams: The University of Adelaide
Timothy P. L. Smith: Clay Center
Erich D. Jarvis: The Rockefeller University
Shawn T. Sullivan: Phase Genomics
Sarah B. Kingan: Pacific Biosciences

Nature Communications, 2021, vol. 12, issue 1, 1-10

Abstract: Abstract Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80–91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.

Date: 2021
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-020-20536-y Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-020-20536-y

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-020-20536-y

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-020-20536-y