AdaCLV for Interpretable Variable Clustering and Dimensionality Reduction of Spectroscopic Data

Marion, Rebecca; Govaerts, Bernadette; von Sachs, Rainer

AdaCLV for Interpretable Variable Clustering and Dimensionality Reduction of Spectroscopic Data

Rebecca Marion (), Bernadette Govaerts () and Rainer von Sachs ()
Additional contact information
Rebecca Marion: Université catholique de Louvain, LIDAM/ISBA, Belgium
Bernadette Govaerts: Université catholique de Louvain, LIDAM/ISBA, Belgium
Rainer von Sachs: Université catholique de Louvain, LIDAM/ISBA, Belgium

No 2020033, LIDAM Reprints ISBA from Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA)

Abstract: This paper introduces a new method, Adaptive Clustering around Latent Variables (AdaCLV), for simultaneous dimensionality reduction and variable clustering, the partitioning of variables into groups. This unsupervised method is particularly well suited for the exploration of spectroscopic datasets, such as Nuclear Magnetic Resonance (NMR) spectra, and can be used for the identification of potential biomarkers. AdaCLV is inspired by existing multivariate methods from the Clustering around Latent Variables (CLV) family, but it offers several key advantages with respect to these methods. First, AdaCLV allows variables to belong to multiple clusters with varying degrees. A cluster membership degree is estimated for each variable and cluster, and these memberships are used to define non-orthogonal latent variables that summarize the clusters. As a result, the clusters and latent variables identified by AdaCLV are more interpretable and representative of spectroscopic data, where peaks for different molecules (i.e. variable clusters) may overlap and variables within a cluster have different degrees of importance. Second, while the performance of existing methods depends greatly on hyperparameter selection, AdaCLV is less sensitive to its hyperparameters, adapting to the clustering structure present in the data. This paper compares AdaCLV with existing CLV methods and other competitors in experiments involving real and semi-artificial NMR spectra. AdaCLV is shown to be more robust to hyperparameter choice and to have better precision than the other methods, for all cluster numbers, sample sizes and levels of signal tested, while achieving a comparable level of recall.

Keywords: Variable clustering; Latent variables; Dimensionality reduction; Nuclear magnetic resonance spectra (search for similar items in EconPapers)
Date: 2020-11-15
Note: In: Chemometrics and Intelligent Laboratory Systems - Vol. 206 (2020)
References: Add references at CitEc
Citations: View citations in EconPapers (1)

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:aiz:louvar:2020033

DOI: 10.1016/j.chemolab.2020.104169

Access Statistics for this paper

More papers in LIDAM Reprints ISBA from Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA) Voie du Roman Pays 20, 1348 Louvain-la-Neuve (Belgium). Contact information at EDIRC.
Bibliographic data for series maintained by Alain Gillis ().