EconPapers    
Economics at your fingertips  
 

Chained correlations for feature selection

Ludwig Lausser, Robin Szekely and Hans A. Kestler ()
Additional contact information
Ludwig Lausser: Ulm University
Robin Szekely: Ulm University
Hans A. Kestler: Ulm University

Advances in Data Analysis and Classification, 2020, vol. 14, issue 4, No 9, 884 pages

Abstract: Abstract Data-driven algorithms stand and fall with the availability and quality of existing data sources. Both can be limited in high-dimensional settings ( $$n \gg m$$ n ≫ m ). For example, supervised learning algorithms designed for molecular pheno- or genotyping are restricted to samples of the corresponding diagnostic classes. Samples of other related entities, such as arise in differential diagnosis, are usually not utilized in this learning scheme. Nevertheless, they might provide domain knowledge on the background or context of the original diagnostic task. In this work, we discuss the possibility of incorporating samples of foreign classes in the training of diagnostic classification models that can be related to the task of differential diagnosis. Especially in heterogeneous data collections comprising multiple diagnostic categories, the foreign ones can change the magnitude of available samples. More precisely, we utilize this information for the internal feature selection process of diagnostic models. We propose the use of chained correlations of original and foreign diagnostic classes. This method allows the detection of intermediate foreign classes by evaluating the correlation between class labels and features for each pair of original and foreign categories. Interestingly, this criterion does not require direct comparisons of the initial diagnostic groups and therefore, might be suitable for settings with restricted data access.

Keywords: Classification; Feature selection; High-dimensional data; Differential diagnosis; 62H30 Classification and discrimination; 62H20 Measures of association (correlation; canonical correlation; etc.); 68T10 Pattern recognition; 62P10 Applications to biology and medical sciences (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11634-020-00397-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:14:y:2020:i:4:d:10.1007_s11634-020-00397-5

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-020-00397-5

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:advdac:v:14:y:2020:i:4:d:10.1007_s11634-020-00397-5