Outlier preservation by dimensionality reduction techniques
Martijn Onderwater
International Journal of Data Analysis Techniques and Strategies, 2015, vol. 7, issue 3, 231-252
Abstract:
Sensors are increasingly part of our daily lives: motion detection, lighting control, and energy consumption all rely on sensors. Combining this information into, for instance, simple and comprehensive graphs can be quite challenging. Dimensionality reduction is often used to address this problem, by decreasing the number of variables in the data and looking for shorter representations. However, dimensionality reduction is often aimed at normal daily data, and applying it to events deviating from this daily data (so-called outliers) can affect such events negatively. In particular, outliers might go unnoticed. In this paper, we show that dimensionality reduction can indeed have a large impact on outliers. To that end we apply three dimensionality reduction techniques to three real-world datasets, and inspect how well they preserve outliers. We use several performance measures to show how well these techniques are capable of preserving outliers, and we discuss the results.
Keywords: dimensionality reduction; outlier detection; multidimensional scaling; MDS; principal component analysis; PCA; peeling; F1-score; t-stochastic neighbourhood embedding; t-SNE; Matthews correlation; relative information score; sensor networks; outlier preservation; outliers; performance measures. (search for similar items in EconPapers)
Date: 2015
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=71365 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:7:y:2015:i:3:p:231-252
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().