An Ensemble and Iterative Recovery Strategy Based k GNN Method to Edit Data with Label Noise
Baiyun Chen,
Longhai Huang,
Zizhong Chen and
Guoyin Wang
Additional contact information
Baiyun Chen: Chongqing Key Laboratory of Computational Intelligence, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Longhai Huang: Chongqing Key Laboratory of Computational Intelligence, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Zizhong Chen: Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
Guoyin Wang: Chongqing Key Laboratory of Computational Intelligence, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Mathematics, 2022, vol. 10, issue 15, 1-28
Abstract:
Learning label noise is gaining increasing attention from a variety of disciplines, particularly in supervised machine learning for classification tasks. The k nearest neighbors ( k NN) classifier is often used as a natural way to edit the training sets due to its sensitivity to label noise. However, the k NN-based editor may remove too many instances if not designed to take care of the label noise. In addition, the one-sided nearest neighbor (NN) rule is unconvincing, as it just considers the nearest neighbors from the perspective of the query sample. In this paper, we propose an ensemble and iterative recovery strategy-based k GNN method (EIRS- k GNN) to edit data with label noise. EIRS- k GNN first uses the general nearest neighbors (GNN) to expand the one-sided NN rule to a binary-sided NN rule, taking the neighborhood of the queried samples into account. Then, it ensembles the prediction results of a finite set of k s in the k GNN to prudently judge the noise levels for each sample. Finally, two loops, i.e., the inner loop and the outer loop, are leveraged to iteratively detect label noise. A frequency indicator is derived from the iterative processes to guide the mixture approaches, including relabeling and removing, to deal with the detected label noise. The goal of EIRS- k GNN is to recover the distribution of the data set as if it were not corrupted. Experimental results on both synthetic data sets and UCI benchmarks, including binary data sets and multi-class data sets, demonstrate the effectiveness of the proposed EIRS- k GNN method.
Keywords: label noise; iterative; ensemble; mutual nearest neighborhood; relabel and remove (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/15/2743/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/15/2743/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:15:p:2743-:d:879128
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().