EconPapers    
Economics at your fingertips  
 

Manifold Learning with Approximate Nearest Neighbors

Fan Cheng, Rob Hyndman and Anastasios Panagiotelis ()

No 3/21, Monash Econometrics and Business Statistics Working Papers from Monash University, Department of Econometrics and Business Statistics

Abstract: Manifold learning algorithms are valuable tools for the analysis of high-dimensional data, many of which include a step where nearest neighbors of all observations are found. This can present a computational bottleneck when the number of observations is large or when the observations lie in more general metric spaces, such as statistical manifolds, which require all pairwise distances between observations to be computed. We resolve this problem by using a broad range of approximate nearest neighbor algorithms within manifold learing algorithms and evaluating their impact on embedding accuracy. We use approximate nearest neighbors for statistical maifolds by exploiting the connection between Hellinger/Total variation distance for discrete distributions and the L2/L1 norm. Via a thorough empirical investigation based on the benchmark MNIST dataset, it is shown that approximate nearest neighbors lead to substantial improvements in computational time with little to no loss in the accuracy of the embedding produced by a manifold learning algorithm. This result is robust to the use of different manifold learning algorithms, to the use of different approximate nearest neighbor algorithms, and to the use of different measures of embedding accuracy. The proposed method is applied to learning statistical manifolds data on distributions of electricity usage. This application demonstrates how the proposed methods can be used to visualize and identify anomalies and uncover underlying structure within high-dimensional data in a way that is scalable to large datasets.

Keywords: statistical manifold; dimension reduction; anomaly detection; k-d trees; Hellinger distance; smart meter data (search for similar items in EconPapers)
JEL-codes: C55 C65 C80 (search for similar items in EconPapers)
Pages: 40
Date: 2021
New Economics Papers: this item is included in nep-big, nep-cmp, nep-ecm and nep-ore
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.monash.edu/business/ebs/research/publications/ebs/wp03-2021.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:msh:ebswps:2021-3

Ordering information: This working paper can be ordered from
http://business.mona ... -business-statistics

Access Statistics for this paper

More papers in Monash Econometrics and Business Statistics Working Papers from Monash University, Department of Econometrics and Business Statistics PO Box 11E, Monash University, Victoria 3800, Australia. Contact information at EDIRC.
Bibliographic data for series maintained by Professor Xibin Zhang ().

 
Page updated 2025-03-30
Handle: RePEc:msh:ebswps:2021-3