On the Performance of Variable Selection and Classification via Rank-Based Classifier
Md Showaib Rahman Sarker,
Michael Pokojovy and
Sangjin Kim
Additional contact information
Md Showaib Rahman Sarker: Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX 79968, USA
Michael Pokojovy: Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX 79968, USA
Sangjin Kim: Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX 79968, USA
Mathematics, 2019, vol. 7, issue 5, 1-16
Abstract:
In high-dimensional gene expression data analysis, the accuracy and reliability of cancer classification and selection of important genes play a very crucial role. To identify these important genes and predict future outcomes (tumor vs. non-tumor), various methods have been proposed in the literature. But only few of them take into account correlation patterns and grouping effects among the genes. In this article, we propose a rank-based modification of the popular penalized logistic regression procedure based on a combination of ? 1 and ? 2 penalties capable of handling possible correlation among genes in different groups. While the ? 1 penalty maintains sparsity, the ? 2 penalty induces smoothness based on the information from the Laplacian matrix, which represents the correlation pattern among genes. We combined logistic regression with the BH-FDR (Benjamini and Hochberg false discovery rate) screening procedure and a newly developed rank-based selection method to come up with an optimal model retaining the important genes. Through simulation studies and real-world application to high-dimensional colon cancer gene expression data, we demonstrated that the proposed rank-based method outperforms such currently popular methods as lasso, adaptive lasso and elastic net when applied both to gene selection and classification.
Keywords: gene-expression data; ? 2 ridge; ? 1 lasso; adapative lasso; elastic net; BH-FDR; Laplacian matrix (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/7/5/457/pdf (application/pdf)
https://www.mdpi.com/2227-7390/7/5/457/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:7:y:2019:i:5:p:457-:d:232921
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().