Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data
Sunduz Keles,
Mark van der Laan,
Sandrine Dudoit and
Simon Cawley
Additional contact information
Mark van der Laan: Division of Biostatistics, School of Public Health, University of California, Berkeley
Sandrine Dudoit: Division of Biostatistics, School of Public Health, University of California, Berkeley
Simon Cawley: Affymetrix, 3380 Central Expressway, Santa Clara, CA 95051
No 1147, U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Abstract:
Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genome-wide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription factor-bound genomic DNA followed by high density oligonucleotide hybridization (Chip) of the IP-enriched DNA. We investigate the ChIP-Chip data structure and propose methods for inferring the location of transcription factor binding sites from these data. The proposed methods involve testing for each probe whether it is part of a bound sequence or not using a scan statistic that takes into account the spatial structure of the data. Different multiple testing procedures are considered for controlling the family-wise error rate and false discovery rate. A nested-Bonferroni adjustment, that is more powerful than the traditional Bonferroni adjustment when the test statistics are dependent, is discussed. Simulation studies show that taking into account the spatial structure of the data substantially improves the sensitivity of the multiple testing procedures. Application of the proposed methods to ChIP-Chip data for transcription factor p53 identified many potential target binding regions along human chromosomes 21 and 22. Among these identified regions, 18% fall within a 3kb vicinity of the 5'UTR of a known gene or CpG island, 31% fall between the codon start site and the codon end site of a known gene but not inside an exon. More than half of these potential target sequences contain the p53 consensus binding site or very close matches to it. Moreover, these target segments include the 13 experimentally verified p53 binding regions of Cawley et al. (2004), as well as 49 additional regions that show higher hybridization signal than these 13 experimentally verified regions.
Keywords: ChIP-Chip data; chromatin immunoprecipitation; high density oligonucleotide array; transcription factor; binding site; p53; multiple testing; scan statistic; family-wise error rate; tail probability of the proportion of false positives; false discovery rate; cross-validation; model selection (search for similar items in EconPapers)
Date: 2004-07-11
Note: oai:bepress.com:ucbbiostat-1147
References: View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
http://www.bepress.com/cgi/viewcontent.cgi?article=1147&context=ucbbiostat (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bep:ucbbio:1147
Access Statistics for this paper
More papers in U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Bibliographic data for series maintained by Christopher F. Baum ().