EconPapers    
Economics at your fingertips  
 

Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values

Taylor Sandra and Pollard Katherine
Additional contact information
Taylor Sandra: University of California, Davis
Pollard Katherine: University of California, San Francisco

Statistical Applications in Genetics and Molecular Biology, 2009, vol. 8, issue 1, 45

Abstract: Data composed of a continuous component plus a point-mass frequently arises in genomic studies. The distribution of this type of data is characterized by the proportion of observations in the point mass and the distribution of the continuous component. Standard statistical methods focus on one of these effects at a time and can fail to detect differences between experimental groups. We propose a novel empirical likelihood ratio test (LRT) statistic for simultaneously testing the null hypothesis of no difference in point-mass proportions and no difference in means of the continuous component. This study evaluates the performance of the empirical LRT and three existing point-mass mixture statistics: 1) Two-part statistic with a t-test for testing mean differences (Two-part t), 2) Two-part statistic with Wilcoxon test for testing mean differences (Two-part W), and 3) parametric LRT.Our investigations begin with an analysis of metabolomics data from Arabidopsis thaliana, which contains many metabolites with a large proportion of observed concentrations in a point-mass at zero. All four point-mass mixture statistics identify more significant differences than standard t-tests and Wilcoxon tests. The empirical LRT appears particularly effective. These findings motivate a large simulation study that assesses Type I and Type II error of the four test statistics with various choices of null distribution. The parametric LRT is frequently the most powerful test, as long as the model assumptions are correct. As is common in `omics data, the Arabidopsis metabolites have widely varying concentration distributions. A single parametric distribution cannot effectively represent all of these distributions, and individually selecting the optimal parametric distribution to use in the LRT for each metabolite is not practical. The empirical LRT, which does not require parametric assumptions, provides an attractive alternative to parametric and standard methods.

Keywords: point-mass mixture; empirical likelihood; two-part statistics; likelihood ratio test; metabolomics; Arabidopsis (search for similar items in EconPapers)
Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1425 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:8

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.2202/1544-6115.1425

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:8