EconPapers    
Economics at your fingertips  
 

Statistical inference and data mining: false discoveries control

Stéphane Lallich (), Olivier Teytaud () and Elie Prudhomme ()
Additional contact information
Stéphane Lallich: Université Lyon 2, Equipe de Recherche en Ingénierie des Connaissances
Olivier Teytaud: LRI, CNRS-Université Paris-Sud, TAO-Inria
Elie Prudhomme: Université Lyon 2, Equipe de Recherche en Ingénierie des Connaissances

A chapter in Compstat 2006 - Proceedings in Computational Statistics, 2006, pp 325-336 from Springer

Abstract: Abstract Data Mining is characterized by its ability at processing large amounts of data. Among those are the data “features”- variables or association rules that can be derived from them. Selecting the most interesting features is a classical data mining problem. That selection requires a large number of tests from which arise a number of false discoveries. An original non parametric control method is proposed in this paper. A new criterion, UAFWER, defined as the risk of exceeding a pre-set number of false discoveries, is controlled by BS FD, a bootstrap based algorithm that can be used on one- or two-sided problems. The usefulness of the procedure is illustrated by the selection of differentially interesting association rules on genetic data.

Keywords: Feature selection; multiple testing; false discoveries; bootstrap (search for similar items in EconPapers)
Date: 2006
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-7908-1709-6_25

Ordering information: This item can be ordered from
http://www.springer.com/9783790817096

DOI: 10.1007/978-3-7908-1709-6_25

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2026-06-25
Handle: RePEc:spr:sprchp:978-3-7908-1709-6_25