On Stochastic Roundoff Errors in Gradient Descent with Low-Precision Computation
Lu Xia (),
Stefano Massei (),
Michiel E. Hochstenbach () and
Barry Koren ()
Additional contact information
Lu Xia: Eindhoven University of Technology
Stefano Massei: Università di Pisa
Michiel E. Hochstenbach: Eindhoven University of Technology
Barry Koren: Eindhoven University of Technology
Journal of Optimization Theory and Applications, 2024, vol. 200, issue 2, No 8, 634-668
Abstract:
Abstract When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.
Keywords: Gradient descent method; Stochastic roundoff error analysis; Low-precision computation; Convergence analysis; Logistic regression; Neural networks; 62J02; 65G50; 68T01 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10957-023-02345-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:200:y:2024:i:2:d:10.1007_s10957-023-02345-7
Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2
DOI: 10.1007/s10957-023-02345-7
Access Statistics for this article
Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull
More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().