On the Parallelization Upper Bound for Asynchronous Stochastic Gradients Descent in Non-convex Optimization
Lifu Wang () and
Bo Shen ()
Additional contact information
Lifu Wang: Beijing Jiaotong University
Bo Shen: Beijing Jiaotong University
Journal of Optimization Theory and Applications, 2023, vol. 196, issue 3, No 6, 900-935
Abstract:
Abstract In deep learning, asynchronous parallel stochastic gradient descent (APSGD) is a broadly used algorithm to speed up the training process. In asynchronous system, the time delay of stale gradients in asynchronous algorithms is generally proportional to the total number of workers. When the number of workers is too large, the time delay will bring additional deviation from the accurate gradient due to delayed gradients and may cause a negative influence on the convergence speed of the algorithm. One may ask: How many workers can be used at most to achieve both the convergence and the speedup? In this paper, we consider the asynchronous training problem with the non-convex case. We theoretically study this problem to find an approximating second-order stationary point using asynchronous algorithms in non-convex optimization and investigate the behaviors of APSGD near-saddle points. This work gives the first theoretical guarantee to find an approximating second-order stationary point in asynchronous algorithms and a provable upper bound for the time delay. The techniques we provide can be generalized to analyze other types of asynchronous algorithms to understand the behaviors of asynchronous algorithms in distributed asynchronous parallel training.
Keywords: Non-convex optimization; Asynchronous parallel optimization; Saddle points escaping; Deep learning (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10957-022-02141-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:196:y:2023:i:3:d:10.1007_s10957-022-02141-9
Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2
DOI: 10.1007/s10957-022-02141-9
Access Statistics for this article
Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull
More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().