Multivariate filter methods for feature selection with the γ-metric
Nicolas Ngo (),
Pierre Michel and
Roch Giorgi
Additional contact information
Nicolas Ngo: AMU - Aix Marseille Université, INSERM - Institut National de la Santé et de la Recherche Médicale, SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale, ISSPAM - Institut des sciences de la santé publique [Marseille]
Pierre Michel: AMU - Aix Marseille Université, CNRS - Centre National de la Recherche Scientifique, AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique
Roch Giorgi: AMU - Aix Marseille Université, APHM - Assistance Publique - Hôpitaux de Marseille, INSERM - Institut National de la Santé et de la Recherche Médicale, SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale, ISSPAM - Institut des sciences de la santé publique [Marseille], TIMONE - Hôpital de la Timone [CHU - APHM], BiosTIC - Biostatistique et technologies de l'information et de la communication (BioSTIC) - [Hôpital de la Timone - APHM] - APHM - Assistance Publique - Hôpitaux de Marseille - TIMONE - Hôpital de la Timone [CHU - APHM], IRD [Occitanie] - Institut de Recherche pour le Développement
Post-Print from HAL
Abstract:
Background The γ-metric value is generally used as the importance score of a feature (or a set of features) in a clas- sification context. This study aimed to go further by creating a new methodology for multivariate feature selection for classification, whereby the γ-metric is associated with a specific search direction (and therefore a specific stopping criterion). As three search directions are used, we effectively created three distinct methods. MethodsWe assessed the performance of our new methodology through a simulation study, comparing them against more conventional methods. Classification performance indicators, number of selected features, stability and execution time were used to evaluate the performance of the methods. We also evaluated how well the proposed methodology selected relevant features for the detection of atrial fibrillation, which is a cardiac arrhythmia. ResultsWe found that in the simulation study as well as the detection of AF task, our methods were able to select informative features and maintain a good level of predictive performance; however in a case of strong correlation and large datasets, the γ-metric based methods were less efficient to exclude non-informative features. Conclusions Results highlighted a good combination of both the forward search direction and the γ-metric as an evaluation function. However, using the backward search direction, the feature selection algorithm could fall into a local optima and can be improved.
Keywords: Atrial fibrillation; Classification; Feature selection; γ-metric (search for similar items in EconPapers)
Date: 2024-12-19
Note: View the original document on HAL open archive server: https://hal.science/hal-04848056v1
References: View references in EconPapers View complete reference list from CitEc
Citations:
Published in BMC Medical Research Methodology, 2024, 24 (1), pp.307. ⟨10.1186/s12874-024-02426-9⟩
Downloads: (external link)
https://hal.science/hal-04848056v1/document (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-04848056
DOI: 10.1186/s12874-024-02426-9
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().