A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies
Anne-Laure Boulesteix,
Robert Hable,
Sabine Lauer and
Manuel J. A. Eugster
The American Statistician, 2015, vol. 69, issue 3, 201-212
Abstract:
In computational sciences, including computational statistics, machine learning, and bioinformatics, it is often claimed in articles presenting new supervised learning methods that the new method performs better than existing methods on real data, for instance in terms of error rate. However, these claims are often not based on proper statistical tests and, even if such tests are performed, the tested hypothesis is not clearly defined and poor attention is devoted to the Type I and Type II errors. In the present article, we aim to fill this gap by providing a proper statistical framework for hypothesis tests that compare the performances of supervised learning methods based on several real datasets with unknown underlying distributions. After giving a statistical interpretation of ad hoc tests commonly performed by computational researchers, we devote special attention to power issues and outline a simple method of determining the number of datasets to be included in a comparison study to reach an adequate power. These methods are illustrated through three comparison studies from the literature and an exemplary benchmarking study using gene expression microarray data. All our results can be reproduced using R codes and datasets available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_pr ofessuren/boulesteix/compstud2013 .
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2015.1005128 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:69:y:2015:i:3:p:201-212
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20
DOI: 10.1080/00031305.2015.1005128
Access Statistics for this article
The American Statistician is currently edited by Eric Sampson
More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().