Matroska Feature-Selection Method for Microarray Dataset (Method 2)
Shuichi Shinmura ()
Additional contact information
Shuichi Shinmura: Seikei University, Faculty of Economics
Chapter Chapter 8 in New Theory of Discriminant Analysis After R. Fisher, 2016, pp 163-189 from Springer
Abstract:
Abstract In this chapter, we introduce the Matroska feature-selection method (Method 2) for microarray dataset (dataset). We have already established the new theory of discriminant analysis (Theory) and developed Revised IP-OLDF. Discriminant analysis has five serious problems. We could not discriminate cases on the discriminant hyperplane (Problem 1) correctly. Only Revised IP-OLDF could solve this problem theoretically. Only H-SVM and Revised IP-OLDF could discriminate the linearly separable data (LSD) theoretically (Problem 2). Problem 3 was that the generalized inverse matrices technique and QDF misclassified all cases to another class for a particular case. We solved Problem 3. Fisher never formulated the standard-error equation for the error rate and discriminant coefficient (Problem 4). We developed the 100-fold cross-validation for a small sample method (Method 1) instead of LOO procedure. The Method 1 offers a 95 % CI for the error rate and coefficient95 % CI of the error rate and discriminant coefficient . We obtained two means of the error rates, M1 and M2, in the training and validation samples and proposed a simple model selection procedure to choose the best model with a minimum M2. We compared two statistical LDFs and six MP-based LDFs: Fisher’s LDF, logistic regression, H-SVM, two S-SVM, Revised IP-OLDF, and another two OLDFs. The best model of Revised IP-OLDF, based on MNM criterion, was found to be better than the seven other best models (M2 s) in the six different types of data. For more than ten years, many researchers have been struggling to analyze the microarray dataset (Problem 5). Only Revised IP-OLDF can naturally select features. We developed a Matroska feature-selection method (Method 2), which finds a surprising dataset structure, which is the disjoint union of several linearly separable subspaces (small Matroskas, SMs). Now, we can analyze SM very quickly. Recently, many researchers have focused on LASSO for making feature-selections, the same as the Method 2. This chapter offers useful datasets and results for LASSO research on the following points: 1. Can LDF by LASSO discriminate our eight different types of datasets exactly? 2. Can LDF by LASSO find the Matroska structure correctly and list all of the smallest basic gene sets or subspaces (BGSs)?
Keywords: Feature-selection; Gene analysis; Microarray dataset (dataset); Minimum number of misclassifications (MNM); Revised IP-OLDF based on MNM criterion by LINGO; SVM by LINGO; Fisher’s LDF by JMP12; Small Matroska (SM); Minimum gene set or subspace (BGS); LASSO (search for similar items in EconPapers)
Date: 2016
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-981-10-2164-0_8
Ordering information: This item can be ordered from
http://www.springer.com/9789811021640
DOI: 10.1007/978-981-10-2164-0_8
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().