EconPapers    
Economics at your fingertips  
 

The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix

Newberg Lee A, McCue Lee Ann and Lawrence Charles E
Additional contact information
Newberg Lee A: NYSDOH Wadsworth Center & Rensselaer Polytechnic Institute Department of Computer Science
McCue Lee Ann: NYSDOH Wadsworth Center
Lawrence Charles E: NYSDOH Wadsworth Center & Brown University

Statistical Applications in Genetics and Molecular Biology, 2005, vol. 4, issue 1, 18

Abstract: Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality.We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters.We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

Keywords: Sequence Weights; Maximum Likelihood; Motifs; Phylogeny; Sequencing; Consensus Distribution; Position-Weight Matrices (search for similar items in EconPapers)
Date: 2005
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1135 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:4:y:2005:i:1:n:13

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.2202/1544-6115.1135

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:4:y:2005:i:1:n:13