EconPapers    
Economics at your fingertips  
 

A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes

Hsieh Fushing, Chen Shu-Chun and Pollard Katherine
Additional contact information
Hsieh Fushing: University of California, Davis
Chen Shu-Chun: Academia Sinica
Pollard Katherine: University of California, San Francisco

The International Journal of Biostatistics, 2009, vol. 5, issue 1, 24

Abstract: CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island "core" is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island "core," a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21.

Keywords: AIC and BIC model selection criteria; non-parametric decoding; filtering criteria; hierarchical factor segmentation; human chromosome 21; mathematical incompleteness; methylation (search for similar items in EconPapers)
Date: 2009
References: Add references at CitEc
Citations: View citations in EconPapers (6)

Downloads: (external link)
https://doi.org/10.2202/1557-4679.1158 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:ijbist:v:5:y:2009:i:1:n:14

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/ijb/html

DOI: 10.2202/1557-4679.1158

Access Statistics for this article

The International Journal of Biostatistics is currently edited by Antoine Chambaz, Alan E. Hubbard and Mark J. van der Laan

More articles in The International Journal of Biostatistics from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:ijbist:v:5:y:2009:i:1:n:14