A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting

Daniel, Rubin; Sandrine, Dudoit; van der Laan, Mark

A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting

Rubin Daniel, Dudoit Sandrine and Mark van der Laan
Additional contact information
Rubin Daniel: University of California, Berkeley
Dudoit Sandrine: University of California, Berkeley
Mark van der Laan: University of California, Berkeley

Statistical Applications in Genetics and Molecular Biology, 2006, vol. 5, issue 1, 20

Abstract: Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the most powerful method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used in a practical setting. We instead consider splitting the sample so that the optimal cutoffs are estimated from a portion of the data, and then testing on the remaining data using these estimated cutoffs. When the null distributions for all test statistics are the same, the obvious way to control the expected number of false positives would be to use a common cutoff for all tests. In this work, we consider the common cutoff method as a benchmark multiple testing procedure. We show that in certain circumstances the use of estimated optimal cutoffs via sample splitting can dramatically outperform this benchmark method, resulting in increased true discoveries, while retaining Type-I error control. This paper is an updated version of the work presented in Rubin et al. (2005), later expanded upon by Wasserman and Roeder (2006).

Keywords: multiple; testing (search for similar items in EconPapers)
Date: 2006
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (13)

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1148 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:5:y:2006:i:1:n:19

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.2202/1544-6115.1148

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().