The Effect of Data Contamination in Sliced Inverse Regression and Finite Sample Breakdown Point
Ulrike Genschel ()
Additional contact information
Ulrike Genschel: Iowa State University
Sankhya A: The Indian Journal of Statistics, 2018, vol. 80, issue 1, No 2, 28-58
Abstract:
Abstract Dimension reduction procedures have received increasing consideration over the past decades. Despite this attention, the effect of data contamination or outlying data points in dimension reduction is, however, not well understood, and is compounded by the issue that outliers can be difficult to classify in the presence of many variables. This paper formally investigates the influence of data contamination for sliced inverse regression (SIR), which is a prototypical dimension reduction procedure that targets a lower-dimensional subspace of a set of regressors needed to explain a response variable. We establish a general theory for how estimated reduction subspaces can be distorted through both the number and direction of outlying data points. The results depend critically on the regressor covariance structure and the most harmful types of data contamination are shown to differ in cases where this covariance structure is known or unknown. For example, if the covariance structure is estimated, data contamination is proven to produce an estimated subspace that is automatically orthogonal to the directions of outlying data points, constituting a potentially serious loss of information. Our main results demonstrate the degree to which data contamination indeed causes incorrect dimension reduction, depending on the amount, magnitude, and direction of contamination. Further, by metricizing distances between dimension reduction subspaces, worst case results for data contamination can be formulated to define a finite sample breakdown point for SIR as a measure of global robustness. Our theoretical findings are illustrated through simulation.
Keywords: Subspace estimation; Data contamination; Sliced inverse regression; Spectral decomposition; Breakdown; Primary 62G35; Secondary 62G08. (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13171-017-0102-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sankha:v:80:y:2018:i:1:d:10.1007_s13171-017-0102-x
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/13171
DOI: 10.1007/s13171-017-0102-x
Access Statistics for this article
Sankhya A: The Indian Journal of Statistics is currently edited by Dipak Dey
More articles in Sankhya A: The Indian Journal of Statistics from Springer, Indian Statistical Institute
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().