Anomaly Detection in High Dimensional Data
Priyanga Talagala (dilini.talagala@monash.edu),
Rob Hyndman and
Kate Smith-Miles (smith-miles@unimelb.edu.au)
No 20/19, Monash Econometrics and Business Statistics Working Papers from Monash University, Department of Econometrics and Business Statistics
Abstract:
The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation that deviates markedly from the majority with a large distance gap. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray.
Keywords: data stream; high-dimensional data; nearest neighbour searching; unsupervised outlier detection (search for similar items in EconPapers)
JEL-codes: C1 C55 C8 (search for similar items in EconPapers)
Pages: 30
Date: 2019
New Economics Papers: this item is included in nep-cmp and nep-ore
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.monash.edu/business/ebs/research/publications/ebs/wp20-2019.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:msh:ebswps:2019-20
Ordering information: This working paper can be ordered from
http://business.mona ... -business-statistics
econometrics@monash.edu
Access Statistics for this paper
More papers in Monash Econometrics and Business Statistics Working Papers from Monash University, Department of Econometrics and Business Statistics PO Box 11E, Monash University, Victoria 3800, Australia. Contact information at EDIRC.
Bibliographic data for series maintained by Professor Xibin Zhang (xibin.zhang@monash.edu).