Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
Allison A. Regier,
Yossi Farjoun,
David E. Larson,
Olga Krasheninina,
Hyun Min Kang,
Daniel P. Howrigan,
Bo-Juen Chen,
Manisha Kher,
Eric Banks,
Darren C. Ames,
Adam C. English,
Heng Li,
Jinchuan Xing,
Yeting Zhang,
Tara Matise,
Goncalo R. Abecasis,
Will Salerno,
Michael C. Zody,
Benjamin M. Neale and
Ira M. Hall ()
Additional contact information
Allison A. Regier: McDonnell Genome Institute, Washington University School of Medicine
Yossi Farjoun: Broad Institute of MIT and Harvard
David E. Larson: McDonnell Genome Institute, Washington University School of Medicine
Olga Krasheninina: Human Genome Sequencing Center, Baylor College of Medicine
Hyun Min Kang: University of Michigan
Daniel P. Howrigan: Broad Institute of MIT and Harvard
Bo-Juen Chen: New York Genome Center
Manisha Kher: New York Genome Center
Eric Banks: Broad Institute of MIT and Harvard
Darren C. Ames: DNAnexus Inc
Adam C. English: Spiral Genetics
Heng Li: Broad Institute of MIT and Harvard
Jinchuan Xing: Rutgers University
Yeting Zhang: Rutgers University
Tara Matise: Rutgers University
Goncalo R. Abecasis: University of Michigan
Will Salerno: Human Genome Sequencing Center, Baylor College of Medicine
Michael C. Zody: New York Genome Center
Benjamin M. Neale: Broad Institute of MIT and Harvard
Ira M. Hall: McDonnell Genome Institute, Washington University School of Medicine
Nature Communications, 2018, vol. 9, issue 1, 1-8
Abstract:
Abstract Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.
Date: 2018
References: Add references at CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
https://www.nature.com/articles/s41467-018-06159-4 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:9:y:2018:i:1:d:10.1038_s41467-018-06159-4
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-018-06159-4
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().