On the Parallelization Upper Bound for Asynchronous Stochastic Gradients Descent in Non-convex Optimization

Wang, Lifu; Shen, Bo

On the Parallelization Upper Bound for Asynchronous Stochastic Gradients Descent in Non-convex Optimization

Lifu Wang () and Bo Shen ()
Additional contact information
Lifu Wang: Beijing Jiaotong University
Bo Shen: Beijing Jiaotong University

Journal of Optimization Theory and Applications, 2023, vol. 196, issue 3, No 6, 900-935

Abstract: Abstract In deep learning, asynchronous parallel stochastic gradient descent (APSGD) is a broadly used algorithm to speed up the training process. In asynchronous system, the time delay of stale gradients in asynchronous algorithms is generally proportional to the total number of workers. When the number of workers is too large, the time delay will bring additional deviation from the accurate gradient due to delayed gradients and may cause a negative influence on the convergence speed of the algorithm. One may ask: How many workers can be used at most to achieve both the convergence and the speedup? In this paper, we consider the asynchronous training problem with the non-convex case. We theoretically study this problem to find an approximating second-order stationary point using asynchronous algorithms in non-convex optimization and investigate the behaviors of APSGD near-saddle points. This work gives the first theoretical guarantee to find an approximating second-order stationary point in asynchronous algorithms and a provable upper bound for the time delay. The techniques we provide can be generalized to analyze other types of asynchronous algorithms to understand the behaviors of asynchronous algorithms in distributed asynchronous parallel training.

Keywords: Non-convex optimization; Asynchronous parallel optimization; Saddle points escaping; Deep learning (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10957-022-02141-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:196:y:2023:i:3:d:10.1007_s10957-022-02141-9

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2

DOI: 10.1007/s10957-022-02141-9

Access Statistics for this article

Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull

More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().