Trimmed Statistical Estimation via Variance Reduction

Aravkin, Aleksandr; Davis, Damek

Trimmed Statistical Estimation via Variance Reduction

Aleksandr Aravkin () and Damek Davis ()
Additional contact information
Aleksandr Aravkin: Department of Applied Mathematics, University of Washington, Seattle, Washington 98195
Damek Davis: School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850

Mathematics of Operations Research, 2020, vol. 45, issue 1, 292–322

Abstract: In this paper, we show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set while (2) simultaneously fitting the trimmed model on the uncontaminated data that remains. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorithm that incorporates prior knowledge through nonsmooth regularization. For data sets of size n , our approach requires O ( n 2/3 / ℇ ) gradient evaluations to reach ℇ -accuracy, and when a certain error bound holds, the complexity improves to O ( κn 2/3 log(1/ ℇ )), where κ is a “condition number.” These rates are n 1/3 times better than those achieved by typical, nonstochastic methods.

Keywords: stochastic algorithms; nonsmooth; nonconvex optimization; trimmed estimators (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1287/moor.2019.0992 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:45:y:2020:i:1:p:292-322

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().