A Decision‐Theory Approach to Interpretable Set Analysis for High‐Dimensional Data
Simina M. Boca,
Héctor Corrada Bravo,
Brian Caffo,
Jeffrey T. Leek and
Giovanni Parmigiani
Biometrics, 2013, vol. 69, issue 3, 614-623
Abstract:
A key problem in high‐dimensional significance analysis is to find pre‐defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision‐theory approach to the analysis of gene sets which focuses on estimating the fraction of non‐null variables in a set. We introduce the idea of “atoms,” non‐overlapping sets based on the original pre‐defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision‐theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self‐contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene‐set and brain ROI data analyses.
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1111/biom.12060
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:biomet:v:69:y:2013:i:3:p:614-623
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0006-341X
Access Statistics for this article
More articles in Biometrics from The International Biometric Society
Bibliographic data for series maintained by Wiley Content Delivery ().