A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning
Hamid Saadatfar,
Samiyeh Khosravi,
Javad Hassannataj Joloudari,
Amir Mosavi and
Shahaboddin Shamshirband
Additional contact information
Hamid Saadatfar: Computer Engineering Department, Faculty of Engineering, University of Birjand, Birjand 9717434765, Iran
Samiyeh Khosravi: Computer Engineering Department, Faculty of Engineering, University of Birjand, Birjand 9717434765, Iran
Javad Hassannataj Joloudari: Computer Engineering Department, Faculty of Engineering, University of Birjand, Birjand 9717434765, Iran
Amir Mosavi: Institute of Structural Mechanics, Bauhaus Universität Weimar, 99423 Weimar, Germany
Shahaboddin Shamshirband: Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Mathematics, 2020, vol. 8, issue 2, 1-12
Abstract:
The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.
Keywords: K-nearest neighbors; KNN; classifier; machine learning; big data; clustering; cluster shape; cluster density; classification; reinforcement learning; machine learning for big data; data science; computation; artificial intelligence (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/8/2/286/pdf (application/pdf)
https://www.mdpi.com/2227-7390/8/2/286/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:8:y:2020:i:2:p:286-:d:322934
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().