Similarity Ranking-Based Instance Selection for Enhancing k-NN Classification Performances
Abdul Muqtasid bin Rushdi,
Mohammad bin Hossin,
Suhaila binti Saee and
Norita binti Md Norwawi
Acta Informatica Pragensia, vol. preprint
Abstract:
Background: The k-nearest neighbours (k-NN) is a well-established classifier in machine learning. Yet, its performance drops and computational costs rise with extensive or redundant datasets. Furthermore, current instance selection (IS) approaches often face scalability problems and are sensitive to parameter settings.Objective: This study seeks to design a straightforward and efficient IS algorithm that reduces both dataset size and computational demands, yet preserves or enhances the accuracy of k-NN classification.Methods: We propose Euclidean ranking-based instance selection (ERbIS), a novel IS approach that prioritises samples based on their Euclidean distance from a single anchor point. In this study, two anchor points are introduced: the first data anchor point (FD) and the mean of each column anchor point (MEC). Both ERbIS models (ERbIS-FD and ERbIS-MEC) are evaluated across 21 KEEL datasets. For performance comparison, the ERbIS models are benchmarked against current and state-of-the-art methods, including condensed nearest neighbour rule (CNN), edited nearest neighbour rule (ENN), adaptive threshold-based instance selection algorithm (ATISA1), decremental reduction optimization procedure (DROP3) and ranking-based instance selection (RIS1). The evaluation focuses on reduction speed, reduction rate and k-NN classification accuracy.Results: The ERbIS models reduce dataset size by an average of 35 to 40% without compromising accuracy compared to the original k-NN and state-of-the-art IS models. Both ERbIS models also demonstrate superior computational efficiency in the reduction process relative to ENN and CNN. Notably, the ERbIS-MEC variant, which utilises the mean of each column as the anchor point, achieves the highest generalisation accuracy among all current and state-of-the-art models.Conclusion: ERbIS offers an efficient and scalable approach for instance selection in k-NN classification, achieving significant data reduction and enhanced predictive accuracy with minimal parameter tuning. The model demonstrates strong potential for application to large datasets and may be further improved by investigating alternative distance metrics or integrating hybrid instance selection strategies.
Keywords: Instance selection; k-nearest neighbours; Data reduction; Euclidean distance; Data classification (search for similar items in EconPapers)
References: Add references at CitEc
Citations:
Downloads: (external link)
http://aip.vse.cz/doi/10.18267/j.aip.310.html (text/html)
free of charge
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:prg:jnlaip:v:preprint:id:310
Ordering information: This journal article can be ordered from
Redakce Acta Informatica Pragensia, Katedra systémové analýzy, Vysoká škola ekonomická v Praze, nám. W. Churchilla 4, 130 67 Praha 3
http://aip.vse.cz
DOI: 10.18267/j.aip.310
Access Statistics for this article
Acta Informatica Pragensia is currently edited by Editorial Office
More articles in Acta Informatica Pragensia from Prague University of Economics and Business Contact information at EDIRC.
Bibliographic data for series maintained by Stanislav Vojir ().