SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data
Zhongyang Zhang and
Ke Hao
PLOS Computational Biology, 2015, vol. 11, issue 11, 1-27
Abstract:
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.Author Summary: Somatic copy number alterations (SCNAs) are essential in oncogensis and progression of a variety of cancers. Accurate identification and quatification of SCNAs are fundamental in the effort of cataloging different variants in cancer genome. This task has its own challenges due to complex nature of tumor SCNA profile and is further complicated by the heterogeneity of the cells collected from a tumor tissue and the contamination from adjacent normal cells, making it difficult for the methods well tailored for the detection of germline copy number variation (CNV) to fit in tumor SCNA detection. Next generation sequencing provides an opportunity to comprehensively characterize SCNA at unprecedent resolution. While total read depth information is commonly used in SCNA detection methods, the allele-specific read depth is less often considered, leading to sub-optimal solution. By incorparating both pieces of information, we developed a segmentation-based pipeline to address aforementioned issues in SCNA detection. This tool is applicable on both deep sequencing data as well as SNP array data and enables accurate and efficient characterization of genome-wide SCNA profile to facilitate large-scale cancer studies.
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004618 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04618&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004618
DOI: 10.1371/journal.pcbi.1004618
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().