ACG-SFE: Adaptive cluster-guided simple, fast, and efficient feature selection for high-dimensional microarray data in binary classification
Yi Wei Tye,
XinYing Chew,
Umi Kalsom Yusof and
Samat Tulpar
PLOS ONE, 2025, vol. 20, issue 9, 1-38
Abstract:
Advances in data collection have resulted in an exponential growth of high-dimensional microarray datasets for binary classification in bioinformatics and medical diagnostics. These datasets generally possess many features but relatively few samples, resulting in challenges associated with the “curse of dimensionality”, such as feature redundancy and an elevated risk of overfitting. While traditional feature selection approaches, such as filter-based and wrapper-based approaches, can help to reduce dimensionality, they often struggle to capture feature interactions while adequately preserving model generalization. Therefore, this paper introduces the Adaptive Cluster-Guided Simple, Fast, and Efficient (ACG-SFE) feature selection, a hybrid approach designed to address the challenges of high-dimensional microarray data in binary classification. ACG-SFE enhances the Simple, Fast, and Efficient (SFE) evolutionary feature selection model by integrating hierarchical clustering to dynamically group correlated features based on the optimal number of clusters determined by the Silhouette index, Davies-Bouldin score, and the feature-to-observation ratio while adaptively selecting representative features within clusters using mutual information and adjusting the selection threshold through a progress factor. This hybrid filter-wrapper approach improves feature interactions, effectively minimizing redundancy and overfitting while enhancing classification performance. The proposed model is assessed against four state-of-the-art evolutionary feature selection models on 11 high-dimensional microarray datasets. Experimental results indicate that ACG-SFE effectively selects a small yet pertinent feature subset, minimizing redundancy while attaining enhanced classification accuracy and F-measure. Furthermore, its reduced RMSE between train and test accuracy substantiates its capability to reduce overfitting, outperforming comparative models. These findings establish ACG-SFE as an effective feature selection model for handling high-dimensional microarray data in binary classification, enhancing classification accuracy while selecting minimal relevant features to reduce unnecessary complexity and the risk of overfitting.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0331089 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 31089&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0331089
DOI: 10.1371/journal.pone.0331089
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().