EconPapers    
Economics at your fingertips  
 

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

Nicolas Loizou () and Peter Richtárik ()
Additional contact information
Nicolas Loizou: Université de Montréal
Peter Richtárik: King Abdullah University of Science and Technology (KAUST)

Computational Optimization and Applications, 2020, vol. 77, issue 3, No 4, 653-710

Abstract: Abstract In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

Keywords: Stochastic methods; Heavy ball momentum; Linear systems; Randomized coordinate descent; Randomized Kaczmarz; Stochastic gradient descent; Stochastic Newton; Quadratic optimization; Convex optimization; 68Q25; 68W20; 68W40; 65Y20; 90C15; 90C20; 90C25; 15A06; 15B52; 65F10 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s10589-020-00220-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:coopap:v:77:y:2020:i:3:d:10.1007_s10589-020-00220-z

Ordering information: This journal article can be ordered from
http://www.springer.com/math/journal/10589

DOI: 10.1007/s10589-020-00220-z

Access Statistics for this article

Computational Optimization and Applications is currently edited by William W. Hager

More articles in Computational Optimization and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:coopap:v:77:y:2020:i:3:d:10.1007_s10589-020-00220-z