Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications
Rajeeva Laxman Karandikar () and
Mathukumalli Vidyasagar ()
Additional contact information
Rajeeva Laxman Karandikar: Chennai Mathematical Institute
Mathukumalli Vidyasagar: Indian Institute of Technology Hyderabad
Journal of Optimization Theory and Applications, 2024, vol. 203, issue 3, No 13, 2412-2450
Abstract:
Abstract In this paper, we study the convergence properties of the Stochastic Gradient Descent (SGD) method for finding a stationary point of a given objective function $$J(\cdot )$$ J ( · ) . The objective function is not required to be convex. Rather, our results apply to a class of “invex” functions, which have the property that every stationary point is also a global minimizer. First, it is assumed that $$J(\cdot )$$ J ( · ) satisfies a property that is slightly weaker than the Kurdyka–Łojasiewicz (KL) condition, denoted here as (KL’). It is shown that the iterations $$J({\varvec{\theta }}_t)$$ J ( θ t ) converge almost surely to the global minimum of $$J(\cdot )$$ J ( · ) . Next, the hypothesis on $$J(\cdot )$$ J ( · ) is strengthened from (KL’) to the Polyak–Łojasiewicz (PL) condition. With this stronger hypothesis, we derive estimates on the rate of convergence of $$J({\varvec{\theta }}_t)$$ J ( θ t ) to its limit. Using these results, we show that for functions satisfying the PL property, the convergence rate of both the objective function and the norm of the gradient with SGD is the same as the best-possible rate for convex functions. While some results along these lines have been published in the past, our contributions contain two distinct improvements. First, the assumptions on the stochastic gradient are more general than elsewhere, and second, our convergence is almost sure, and not in expectation. We also study SGD when only function evaluations are permitted. In this setting, we determine the “optimal” increments or the size of the perturbations. Using the same set of ideas, we establish the global convergence of the Stochastic Approximation (SA) algorithm under more general assumptions on the measurement error, compared to the existing literature. We also derive bounds on the rate of convergence of the SA algorithm under appropriate assumptions.
Keywords: Stochastic gradient descent; Stochastic approximation; Nonconvex optimization; Martingale methods; 62L20; 60G17; 93D05 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10957-024-02547-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:203:y:2024:i:3:d:10.1007_s10957-024-02547-7
Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2
DOI: 10.1007/s10957-024-02547-7
Access Statistics for this article
Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull
More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().