Subset selection algorithm based on mutual information
Moon Y. Huh ()
Additional contact information
Moon Y. Huh: Sungkyunkwan University
A chapter in Compstat 2006 - Proceedings in Computational Statistics, 2006, pp 461-470 from Springer
Abstract:
Abstract Best subset selection problem is one of the classical problems in statistics and in data mining. When variables of concern are continuous types, the problem is classical in classical regression problems. Most of the data mining techniques including decision trees are designed to handle discrete type variables only. With complex data, most of the data mining techniques first transform continuous variables into discrete variables before applying the techniques. Hence the result depends on the discretiztion method applied. This paper proposes an algorithm to select a best subset using the original data set. The algorithm is based on mutual information (MI) introduced by Shannon [Shan48]. It computes MI’s of up to two-dimensional variables: both continuous, both discrete, or one continuous and one discrete. It has and automatic stopping criterion when appropriate subset is selected.
Keywords: Variable selection; mutual information; normal mixture; EM algorithm (search for similar items in EconPapers)
Date: 2006
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-7908-1709-6_37
Ordering information: This item can be ordered from
http://www.springer.com/9783790817096
DOI: 10.1007/978-3-7908-1709-6_37
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().