Accounting for outliers in optimal subsampling methods
Laura Deldossi (),
Elena Pesce () and
Chiara Tommasi ()
Additional contact information
Laura Deldossi: Università Cattolica del Sacro Cuore
Elena Pesce: Swiss Re Institute, Swiss Re Management Ltd
Chiara Tommasi: University of Milan
Statistical Papers, 2023, vol. 64, issue 4, No 7, 1119-1135
Abstract:
Abstract Nowadays, in many different fields, massive data are available and for several reasons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observations. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influence). To overcome this problem, firstly, we propose a non-informative “exchange” procedure that enables us to select a “nearly” D-optimal subset of observations without high leverage values. Then, we provide an informative version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not necessarily associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the non-informative and informative selection procedures are adapted to I-optimality, with the goal of getting accurate predictions.
Keywords: D-optimality; I-optimality; Active learning; Subsampling; 62K05; 62D99; 62F35; 62J05 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00362-023-01422-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:stpapr:v:64:y:2023:i:4:d:10.1007_s00362-023-01422-3
Ordering information: This journal article can be ordered from
http://www.springer. ... business/journal/362
DOI: 10.1007/s00362-023-01422-3
Access Statistics for this article
Statistical Papers is currently edited by C. Müller, W. Krämer and W.G. Müller
More articles in Statistical Papers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().