Gene Selection Algorithms in a Single-Cell Gene Decision Space Based on Self-Information

Fang, Yan; Lin, Yonghua; Huang, Chuanbo; Li, Zhaowen

Gene Selection Algorithms in a Single-Cell Gene Decision Space Based on Self-Information

Yan Fang, Yonghua Lin (), Chuanbo Huang and Zhaowen Li
Additional contact information
Yan Fang: Fujian Provincial Key Laboratory of Data-Intensive Computing, Fujian University Laboratory of Intelligent Computing and Information Processing, School of Mathematics and Computer Science, Quanzhou Normal University, Quanzhou 362000, China
Yonghua Lin: Fujian Key Laboratory of Financial Information Processing, Key Laboratory of Applied Mathematics in Fujian Province University, Putian University, Putian 351100, China
Chuanbo Huang: Fujian Province University Key Laboratory of Computational Science, School of Mathematical Sciences, Huaqiao University, Quanzhou 362000, China
Zhaowen Li: Fujian Key Laboratory of Financial Information Processing, Key Laboratory of Applied Mathematics in Fujian Province University, Putian University, Putian 351100, China

Mathematics, 2025, vol. 13, issue 11, 1-24

Abstract: A critical step for gene selection algorithms using rough set theory is the establishment of a gene evaluation function to assess the classification ability of candidate gene subsets. The concept of dependency in a classic neighborhood rough set model plays the role of this evaluation function. This criterion only notes the information provided by the lower approximation and omits the upper approximation, which may result in the loss of some important information. This paper proposes gene selection algorithms within a single-cell gene decision space by employing self-information, taking into account both lower and upper approximations. Initially, the distance between gene expression values within each subspace is defined to establish the tolerance relation on the cell set. Subsequently, self-information is introduced through the lens of tolerance classes. The relationship between these measures and their respective properties is then examined in detail. For gene expression data, the proposed self-information metric demonstrates superiority over other measures by accounting for both lower and upper approximations, thereby facilitating the selection of optimal gene subsets. Finally, gene selection algorithms within a single-cell gene decision space are developed based on the proposed self-information metric, and experiments conducted on 10 publicly available single-cell datasets indicate that the classification performance of the proposed algorithms can be enhanced through the selection of genes pertinent to classification. The results demonstrate that F i − S I achieves an average classification accuracy of 93.7% (KNN) while selecting 48.3% fewer genes than Fisher’s score.

Keywords: single-cell gene decision space; gene selection; self-information (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/11/1829/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/11/1829/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:11:p:1829-:d:1668368

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().