A multiple filter-wrapper feature selection algorithm based on process optimization mechanism for high-dimensional omics data analysis

Shi, Yongtao; Zheng, Yuefeng; Bai, Xiaotong

A multiple filter-wrapper feature selection algorithm based on process optimization mechanism for high-dimensional omics data analysis

Yongtao Shi, Yuefeng Zheng and Xiaotong Bai

PLOS ONE, 2025, vol. 20, issue 12, 1-44

Abstract: Recently, hybrid feature selection methods have demonstrated excellent performance on high-dimensional data, but many of these methods tend to yield relatively homogeneous feature subsets. To address this, we propose a novel hybrid feature selection algorithm called the Hybrid Multiple Filter-Wrapper algorithm. This algorithm employs a dual-module structure: Module 1 utilizes the random forest feature importance method to achieve significant dimensionality reduction of the original feature set, resulting in the candidate feature subset F1. In Module 2, we first propose a bivariate filter algorithm: the minimum Spearman-Maximum Mutual Information method. This method assesses both the correlation and redundancy of F1, whose results are then fed into the wrapper algorithm for further exploration. Furthermore, we integrate two swarm intelligence algorithms to develop the Hybrid Grey Wolf and Chaotic Dung Beetle Wrapper Algorithm. This algorithm incorporates chaos theory to enhance the position update mechanism of the Dung Beetle Algorithm, then embeds Dung Beetle Algorithm into the Grey Wolf Algorithm, thereby balancing exploration and exploitation capabilities. Finally, a process optimization mechanism based on the theory of random laser intensity fluctuations dynamically monitors the optimization process. Upon convergence of the wrapper algorithm to a local optimum, the filter algorithm is restarted, and chaos theory is used to reset the population. This process enhances the diversity of both the candidate feature subset and the population, effectively avoiding local optima. We extensively compare our method with ten hybrid algorithms from the past three years across ten public benchmark datasets from MGE. Experimental results show that our algorithm outperforms the most other algorithms: on all datasets, it achieves an average classification accuracy that is at 1.3% least higher, an average feature subset length that is at least 8 units shorter, and a dimensionality reduced to less than 0.45% of the original. The results are statistically significant.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0338051 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 38051&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0338051

DOI: 10.1371/journal.pone.0338051

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().