Rank-based classifiers for extremely high-dimensional gene expression data
Ludwig Lausser,
Florian Schmid,
Lyn-Rouven Schirra,
Adalbert F. X. Wilhelm and
Hans A. Kestler ()
Additional contact information
Ludwig Lausser: Ulm University
Florian Schmid: Ulm University
Lyn-Rouven Schirra: Ulm University
Adalbert F. X. Wilhelm: Jacobs University
Hans A. Kestler: Ulm University
Advances in Data Analysis and Classification, 2018, vol. 12, issue 4, No 6, 917-936
Abstract:
Abstract Predicting phenotypes on the basis of gene expression profiles is a classification task that is becoming increasingly important in the field of precision medicine. Although these expression signals are real-valued, it is questionable if they can be analyzed on an interval scale. As with many biological signals their influence on e.g. protein levels is usually non-linear and thus can be misinterpreted. In this article we study gene expression profiles with up to 54,000 dimensions. We analyze these measurements on an ordinal scale by replacing the real-valued profiles by their ranks. This type of rank transformation can be used for the construction of invariant classifiers that are not affected by noise induced by data transformations which can occur in the measurement setup. Our 10 $$\times $$ × 10 fold cross-validation experiments on 86 different data sets and 19 different classification models indicate that classifiers largely benefit from this transformation. Especially random forests and support vector machines achieve improved classification results on a significant majority of datasets.
Keywords: Rank-based classification; Invariance; High-dimensional data; Gene expression data; 62H30 Classification and discrimination; cluster analysis, 68T10 Pattern recognition, speech recognition, 92C40 Biochemistry, molecular biology (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11634-016-0277-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:12:y:2018:i:4:d:10.1007_s11634-016-0277-3
Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2
DOI: 10.1007/s11634-016-0277-3
Access Statistics for this article
Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs
More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().