The exact distribution of the k-tuple statistic for sequence homology
W. Y. Wendy Lou
Statistics & Probability Letters, 2003, vol. 61, issue 1, 51-59
Abstract:
The distribution theory of runs and patterns has become increasingly useful in the field of biological sequence homology. One important application in detecting tandem duplications among DNA sequence segments is the k-tuple statistic Sn,k, the sum of matches in matching-runs of length k or longer in a sequence of n i.i.d. Bernoulli trials with success/matching probability p. Current approaches to this distribution problem are based on various approximations, due mainly to the numerical complexity of computing the exact distribution using a straightforward combinatorial approach. In this paper, we obtain a simple and efficient expression for the exact distribution of Sn,k using the principle of finite Markov chain imbedding. Our numerical results illustrate most importantly that for pattern lengths in the range n=10 to 100, a range commonly used in detecting DNA tandem repeats, the distribution, in general, is highly skewed and far from normal.
Keywords: DNA; sequence; matching; Markov; chain; Runs; and; patterns (search for similar items in EconPapers)
Date: 2003
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (10)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167-7152(02)00337-1
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:stapro:v:61:y:2003:i:1:p:51-59
Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
https://shop.elsevie ... _01_ooc_1&version=01
Access Statistics for this article
Statistics & Probability Letters is currently edited by Somnath Datta and Hira L. Koul
More articles in Statistics & Probability Letters from Elsevier
Bibliographic data for series maintained by Catherine Liu ().