Modeling ChIP Sequencing In Silico with Applications

Zhang, Zhengdong D; Rozowsky, Joel; Snyder, Michael; Chang, Joseph; Gerstein, Mark

Modeling ChIP Sequencing In Silico with Applications

Zhengdong D Zhang, Joel Rozowsky, Michael Snyder, Joseph Chang and Mark Gerstein

PLOS Computational Biology, 2008, vol. 4, issue 8, 1-10

Abstract: ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.Author Summary: ChIP-seq is an apt combination of chromosome immunoprecipitation and next-generation sequencing to identify transcription factor binding sites in vivo on the whole-genome scale. Since its advent, this new method has generated much excitement in the field of functional genomics. Proper computational modeling of the ChIP-seq process is needed for both data scoring and determination of adequate sequencing depth, as it provides the computational foundation for analyzing ChIP-seq data. In our study, we show the characteristics of ChIP-seq data and present in silico ChIP sequencing, a computational method to simulate the experimental outcome. On the basis of our data characterization, we observed transcription factor binding sites with excessive enrichment of sequence tags. Our simulation results reveal that both the genomic background and the binding sites are not uniform. On the basis of our simulation results, we propose a statistical procedure using the more realistic genomic background model to identify binding sites in ChIP-seq data.

Date: 2008
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000158 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 00158&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1000158

DOI: 10.1371/journal.pcbi.1000158

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().