Statistical inference and data mining: false discoveries control
Stéphane Lallich (),
Olivier Teytaud () and
Elie Prudhomme ()
Additional contact information
Stéphane Lallich: Université Lyon 2, Equipe de Recherche en Ingénierie des Connaissances
Olivier Teytaud: LRI, CNRS-Université Paris-Sud, TAO-Inria
Elie Prudhomme: Université Lyon 2, Equipe de Recherche en Ingénierie des Connaissances
A chapter in Compstat 2006 - Proceedings in Computational Statistics, 2006, pp 325-336 from Springer
Abstract:
Abstract Data Mining is characterized by its ability at processing large amounts of data. Among those are the data “features”- variables or association rules that can be derived from them. Selecting the most interesting features is a classical data mining problem. That selection requires a large number of tests from which arise a number of false discoveries. An original non parametric control method is proposed in this paper. A new criterion, UAFWER, defined as the risk of exceeding a pre-set number of false discoveries, is controlled by BS FD, a bootstrap based algorithm that can be used on one- or two-sided problems. The usefulness of the procedure is illustrated by the selection of differentially interesting association rules on genetic data.
Keywords: Feature selection; multiple testing; false discoveries; bootstrap (search for similar items in EconPapers)
Date: 2006
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-7908-1709-6_25
Ordering information: This item can be ordered from
http://www.springer.com/9783790817096
DOI: 10.1007/978-3-7908-1709-6_25
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().