EconPapers    
Economics at your fingertips  
 

High-Probability Complexity Bounds for Non-smooth Stochastic Convex Optimization with Heavy-Tailed Noise

Eduard Gorbunov (), Marina Danilova (), Innokentiy Shibaev (), Pavel Dvurechensky () and Alexander Gasnikov ()
Additional contact information
Eduard Gorbunov: Mohamed bin Zayed University of Artificial Intelligence
Marina Danilova: Institute of Control Sciences RAS
Innokentiy Shibaev: Moscow Institute of Physics and Technology
Pavel Dvurechensky: Weierstrass Institute for Applied Analysis and Stochastics
Alexander Gasnikov: Moscow Institute of Physics and Technology

Journal of Optimization Theory and Applications, 2024, vol. 203, issue 3, No 21, 2679-2738

Abstract: Abstract Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residuals with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.

Keywords: Stochastic optimization; Convex optimization; Non-smooth optimization; High-probability bounds; Heavy-tailed noise; 90C25; 90C15; 68Q25; 65Y20 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10957-024-02533-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:203:y:2024:i:3:d:10.1007_s10957-024-02533-z

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2

DOI: 10.1007/s10957-024-02533-z

Access Statistics for this article

Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull

More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:joptap:v:203:y:2024:i:3:d:10.1007_s10957-024-02533-z