Embedding Methods and Robust Statistics for Dimension Reduction
George Ostrouchov () and
Nagiza F. Samatova
Additional contact information
George Ostrouchov: Computer Science and Mathematics Division, Oak Ridge National Laboratory
Nagiza F. Samatova: Computer Science and Mathematics Division, Oak Ridge National Laboratory
A chapter in COMPSTAT 2004 — Proceedings in Computational Statistics, 2004, pp 359-370 from Springer
Abstract:
Abstract Recently, several non-deterministic distance embedding methods that can be used for fast dimension reduction have been proposed in the machine learning literature. These include FastMap, MetricMap, and SparseMap. Among them, FastMap, implicitly assumes that the objects are points in a p-dimensional Euclidean space. It selects a sequence of k ≤ p orthogonal axes defined by distant pairs of points (called pivots) and computes the projection of the points onto the orthogonal axes. We show that FastMap picks all of its pivots from the vertices of the convex hull of the data points in the original implicit Euclidean space. This provides a connection to results in robust statistics, where the convex hull is used as a tool in multivariate outlier detection and in robust estimation methods. The connection sheds a new light on some properties of FastMap and provides an opportunity for a robust class of dimension reduction algorithms that we call RobustMaps, which retain the speed of FastMap and exploit ideas in robust statistics. One simple RobustMap algorithm is shown to outperform principal components on contaminated data both in terms of clean variance captured and in terms of time complexity.
Keywords: Dimension reduction; convex hull; FastMap; principal components; multidimensional scaling; robust statistics; Euclidean distance (search for similar items in EconPapers)
Date: 2004
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-7908-2656-2_29
Ordering information: This item can be ordered from
http://www.springer.com/9783790826562
DOI: 10.1007/978-3-7908-2656-2_29
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().