Uniform genomic data analysis in the NCI Genomic Data Commons
Zhenyu Zhang,
Kyle Hernandez,
Jeremiah Savage,
Shenglai Li,
Dan Miller,
Stuti Agrawal,
Francisco Ortuno,
Louis M. Staudt,
Allison Heath and
Robert L. Grossman ()
Additional contact information
Zhenyu Zhang: Center for Translational Data Science, University of Chicago
Kyle Hernandez: Center for Translational Data Science, University of Chicago
Jeremiah Savage: Center for Translational Data Science, University of Chicago
Shenglai Li: Center for Translational Data Science, University of Chicago
Dan Miller: Center for Translational Data Science, University of Chicago
Stuti Agrawal: Center for Translational Data Science, University of Chicago
Francisco Ortuno: Center for Translational Data Science, University of Chicago
Louis M. Staudt: National Cancer Institute
Allison Heath: Children’s Hospital of Philadelphia
Robert L. Grossman: Center for Translational Data Science, University of Chicago
Nature Communications, 2021, vol. 12, issue 1, 1-11
Abstract:
Abstract The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive ( https://gdc.cancer.gov/ ).
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
https://www.nature.com/articles/s41467-021-21254-9 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-21254-9
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-021-21254-9
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().