EconPapers    
Economics at your fingertips  
 

Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data

Yuanzi He, Jiali He (), Haotian Liu () and Zhaowen Li
Additional contact information
Yuanzi He: College of Computer Science, Guangdong University of Science and Technology, Dongguan 523083, China
Jiali He: Key Laboratory of Complex System Optimization and Big Data Processing, Department of Guangxi Education, Yulin Normal University, Yulin 537000, China
Haotian Liu: Center for Applied Mathematics of Guangxi, Yulin Normal University, Yulin 537000, China
Zhaowen Li: College of Computer Science, Guangdong University of Science and Technology, Dongguan 523083, China

Mathematics, 2025, vol. 13, issue 8, 1-33

Abstract: In machine learning, when the labeled portion of data needs to be processed, a semi-supervised learning algorithm is used. A dataset with missing attribute values or labels is referred to as an incomplete information system. Addressing incomplete information within a system poses a significant challenge, which can be effectively tackled through the application of rough set theory ( R -theory). However, R -theory has its limits: It fails to consider the frequency of an attribute value and then cannot the distribution of attribute values appropriately. If we consider partially labeled data and replace a missing attribute value with the multiset of all possible attribute values under the same attribute, this results in the emergence of partially labeled multiset-valued data. In a semi-supervised learning algorithm, in order to save time and costs, a large number of redundant features need to be deleted. This study proposes semi-supervised attribute selection algorithms for partially labeled multiset-valued data. Initially, a partially labeled multiset-valued decision information system (p-MSVDIS) is partitioned into two distinct systems: a labeled multiset-valued decision information system (l-MSVDIS) and an unlabeled multiset-valued decision information system (u-MSVDIS). Subsequently, using the indistinguishable relation, distinguishable relation, and dependence function, two types of attribute subset importance in a p-MSVDIS are defined: the weighted sum of l-MSVDIS and u-MSVDIS determined by the missing rate of labels, which can be considered an uncertainty measurement (UM) of a p-MSVDIS. Next, two adaptive semi-supervised attribute selection algorithms for a p-MSVDIS are introduced, which leverage the degrees of importance, allowing for automatic adaptation to diverse missing rates. Finally, experiments and statistical analyses are conducted on 11 datasets. The outcome indicates that the proposed algorithms demonstrate advantages over certain algorithms.

Keywords: partially labeled multiset-valued data; p-MSVDIS; uncertainty measurement; semi-supervised attribute selection; dependence function; information entropy (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/8/1318/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/8/1318/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:8:p:1318-:d:1636992

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-04-18
Handle: RePEc:gam:jmathe:v:13:y:2025:i:8:p:1318-:d:1636992