A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles
Irad Ben-Gal (),
Marcelo Bacher (),
Morris Amara () and
Erez Shmueli ()
Additional contact information
Irad Ben-Gal: Department of Industrial Engineering, Tel Aviv University, 69978 Tel Aviv, Israel
Marcelo Bacher: Department of Industrial Engineering, Tel Aviv University, 69978 Tel Aviv, Israel
Morris Amara: Department of Industrial Engineering, Tel Aviv University, 69978 Tel Aviv, Israel
Erez Shmueli: Department of Industrial Engineering, Tel Aviv University, 69978 Tel Aviv, Israel
INFORMS Joural on Data Science, 2023, vol. 2, issue 2, 99-115
Abstract:
Identifying anomalies in multidimensional data sets is an important yet challenging task in many real-world applications. A special case arises when anomalies are occluded in a small subset of attributes. We propose a new subspace analysis approach, called agglomerative attribute grouping (AAG), that searches for subspaces composed of highly correlative (in the general sense) attributes. Such correlations among attributes can better reflect the behavior of normal observations and hence, can be used to improve the identification of abnormal data samples. The proposed AAG algorithm relies on a generalized multiattribute measure (derived from information theory measures over attributes’ partitions) for evaluating the “information distance” among various subsets of attributes. To determine the set of subspaces, AAG applies a variation of the well-known agglomerative clustering algorithm with the proposed measure as the underlying distance function, whereas in contrast to existing methods, AAG does not require any tuning of parameters. Finally, the set of informative subspaces can be used to improve subspace-based analytical tasks, such as anomaly detection, novelty detection, forecasting, and clustering. Extensive evaluation over real-world data sets demonstrates that (i) in the vast majority of cases, AAG outperforms both classical and state-of-the-art subspace analysis methods when used in anomaly and novelty detection ensembles; (ii) it often generates fewer subspaces with fewer attributes each, thus resulting in faster training times for the anomaly and novelty detection ensemble; and (iii) the generated subspaces can also be useful in other analytical tasks, such as clustering and forecasting.
Keywords: subspace analysis; anomaly detection; novelty detection; Rokhlin distance (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/ijds.2023.0027 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orijds:v:2:y:2023:i:2:p:99-115
Access Statistics for this article
More articles in INFORMS Joural on Data Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().