Compositional spectrum—revealing patterns for genomic sequence characterization and comparison
Valery M. Kirzhner,
Abraham B. Korol,
Alexander Bolshoy and
Eviatar Nevo
Physica A: Statistical Mechanics and its Applications, 2002, vol. 312, issue 3, 447-457
Abstract:
In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet {A,C,G,T}) from a sufficiently large set W of arbitrary (in general case) words. According to our approach, any genomic sequence can be characterized by a histogram of frequencies of imperfect matching of words from the set W that is called a compositional spectrum (CS). The specificity of CSs is manifest in a reasonable similarity of spectra obtained on different stretches of the same genome and, simultaneously, in a broad range of dissimilarities between spectral characteristics of different genomes. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.
Keywords: DNA sequences; Set of words; Sequence comparisons; Compositional spectra; Imperfect matching (search for similar items in EconPapers)
Date: 2002
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437102008439
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:312:y:2002:i:3:p:447-457
DOI: 10.1016/S0378-4371(02)00843-9
Access Statistics for this article
Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis
More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().