A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data
Perrot-Dockès Marie (),
Lévy-Leduc Céline,
Chiquet Julien,
Sansonnet Laure,
Brégère Margaux,
Étienne Marie-Pierre,
Robin Stéphane and
Genta-Jouve Grégory
Additional contact information
Perrot-Dockès Marie: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Lévy-Leduc Céline: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Chiquet Julien: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Sansonnet Laure: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Brégère Margaux: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Étienne Marie-Pierre: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Robin Stéphane: UMR MIA-Paris, AgroParisTech, INRA – Université Paris-Saclay, 75005 Paris, France
Genta-Jouve Grégory: UMR CNRS 8638 Comète – Université Paris-Descartes, CNRS, 75006 Paris, France
Statistical Applications in Genetics and Molecular Biology, 2018, vol. 17, issue 5, 14
Abstract:
Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).
Keywords: metabolomics; multivariate linear model; time series; variable selection (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/sagmb-2017-0077 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:17:y:2018:i:5:p:14:n:3
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.1515/sagmb-2017-0077
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().