EconPapers    
Economics at your fingertips  
 

Partition Selection for Large-Scale Data Management Using KNN Join Processing

Yue Hu, Ge Peng, Zehua Wang, Yanrong Cui and Hang Qin

Mathematical Problems in Engineering, 2020, vol. 2020, 1-14

Abstract:

For the data processing with increasing avalanche under large datasets, the k nearest neighbors (KNN) algorithm is a particularly expensive operation for both classification and regression predictive problems. To predict the values of new data points, it can calculate the feature similarity between each object in the test dataset and each object in the training dataset. However, due to expensive computational cost, the single computer is out of work to deal with large-scale dataset. In this paper, we propose an adaptive vKNN algorithm, which adopts on the Voronoi diagram under the MapReduce parallel framework and makes full use of the advantages of parallel computing in processing large-scale data. In the process of partition selection, we design a new predictive strategy for sample point to find the optimal relevant partition. Then, we can effectively collect irrelevant data, reduce KNN join computation, and improve the operation efficiency. Finally, we use a large number of 54-dimensional datasets to conduct a large number of experiments on the cluster. The experimental results show that our proposed method is effective and scalable with ensuring accuracy.

Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2020/7898230.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2020/7898230.xml (text/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:7898230

DOI: 10.1155/2020/7898230

Access Statistics for this article

More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().

 
Page updated 2025-03-19
Handle: RePEc:hin:jnlmpe:7898230