EconPapers    
Economics at your fingertips  
 

Interpretation and Visualization of Non-Linear Data Fusion in Kernel Space: Study on Metabolomic Characterization of Progression of Multiple Sclerosis

Agnieszka Smolinska, Lionel Blanchet, Leon Coulier, Kirsten A M Ampt, Theo Luider, Rogier Q Hintzen, Sybren S Wijmenga and Lutgarde M C Buydens

PLOS ONE, 2012, vol. 7, issue 6, 1-12

Abstract: Background: In the last decade data fusion has become widespread in the field of metabolomics. Linear data fusion is performed most commonly. However, many data display non-linear parameter dependences. The linear methods are bound to fail in such situations. We used proton Nuclear Magnetic Resonance and Gas Chromatography-Mass Spectrometry, two well established techniques, to generate metabolic profiles of Cerebrospinal fluid of Multiple Sclerosis (MScl) individuals. These datasets represent non-linearly separable groups. Thus, to extract relevant information and to combine them a special framework for data fusion is required. Methodology: The main aim is to demonstrate a novel approach for data fusion for classification; the approach is applied to metabolomics datasets coming from patients suffering from MScl at a different stage of the disease. The approach involves data fusion in kernel space and consists of four main steps. The first one is to extract the significant information per data source using Support Vector Machine Recursive Feature Elimination. This method allows one to select a set of relevant variables. In the next step the optimized kernel matrices are merged by linear combination. In step 3 the merged datasets are analyzed with a classification technique, namely Kernel Partial Least Square Discriminant Analysis. In the final step, the variables in kernel space are visualized and their significance established. Conclusions: We find that fusion in kernel space allows for efficient and reliable discrimination of classes (MScl and early stage). This data fusion approach achieves better class prediction accuracy than analysis of individual datasets and the commonly used mid-level fusion. The prediction accuracy on an independent test set (8 samples) reaches 100%. Additionally, the classification model obtained on fused kernels is simpler in terms of complexity, i.e. just one latent variable was sufficient. Finally, visualization of variables importance in kernel space was achieved.

Date: 2012
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038163 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 38163&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0038163

DOI: 10.1371/journal.pone.0038163

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pone00:0038163