Effect of sequence depth and length in long-read assembly of the maize inbred NC358
Shujun Ou,
Jianing Liu,
Kapeel M. Chougule,
Arkarachai Fungtammasan,
Arun S. Seetharam,
Joshua C. Stein,
Victor Llaca,
Nancy Manchanda,
Amanda M. Gilbert,
Sharon Wei,
Chen-Shan Chin,
David E. Hufnagel,
Sarah Pedersen,
Samantha J. Snodgrass,
Kevin Fengler,
Margaret Woodhouse,
Brian P. Walenz,
Sergey Koren,
Adam M. Phillippy,
Brett T. Hannigan,
R. Kelly Dawe (),
Candice N. Hirsch (),
Matthew B. Hufford () and
Doreen Ware ()
Additional contact information
Shujun Ou: Iowa State University
Jianing Liu: University of Georgia
Kapeel M. Chougule: Cold Spring Harbor Laboratory
Arkarachai Fungtammasan: Mountain View
Arun S. Seetharam: Iowa State University
Joshua C. Stein: Cold Spring Harbor Laboratory
Victor Llaca: Applied Science and Technology, Corteva Agriscience TM
Nancy Manchanda: Iowa State University
Amanda M. Gilbert: University of Minnesota
Sharon Wei: Cold Spring Harbor Laboratory
Chen-Shan Chin: Mountain View
David E. Hufnagel: Iowa State University
Sarah Pedersen: Iowa State University
Samantha J. Snodgrass: Iowa State University
Kevin Fengler: Applied Science and Technology, Corteva Agriscience TM
Margaret Woodhouse: USDA ARS Corn Insects and Crop Genetics Research Unit
Brian P. Walenz: National Institutes of Health
Sergey Koren: National Institutes of Health
Adam M. Phillippy: National Institutes of Health
Brett T. Hannigan: Mountain View
R. Kelly Dawe: University of Georgia
Candice N. Hirsch: University of Minnesota
Matthew B. Hufford: Iowa State University
Doreen Ware: Cold Spring Harbor Laboratory
Nature Communications, 2020, vol. 11, issue 1, 1-10
Abstract:
Abstract Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.
Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41467-020-16037-7 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16037-7
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-020-16037-7
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().