Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Protasov, Stanislav; Khan, Adil Mehmood; Cheong, Siew Ann

Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Stanislav Protasov, Adil Mehmood Khan and Siew Ann Cheong

Complexity, 2021, vol. 2021, 1-9

Abstract: K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2â€“4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.

Date: 2021
References: Add references at CitEc
Citations:

Downloads: (external link)
http://downloads.hindawi.com/journals/complexity/2021/2011738.pdf (application/pdf)
http://downloads.hindawi.com/journals/complexity/2021/2011738.xml (application/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:complx:2011738

DOI: 10.1155/2021/2011738

Access Statistics for this article

More articles in Complexity from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().