Eigenvalue-Corrected Natural Gradient Based on a New Approximation
Kaixin Gao (),
Zheng-Hai Huang (),
Xiaolei Liu (),
Min Wang (),
Shuangling Wang (),
Zidong Wang (),
Dachuan Xu and
Fan Yu ()
Additional contact information
Kaixin Gao: School of Mathematics, Tianjin University, Tianjin 300350, P. R. China
Zheng-Hai Huang: School of Mathematics, Tianjin University, Tianjin 300350, P. R. China
Xiaolei Liu: School of Mathematics, Tianjin University, Tianjin 300350, P. R. China
Min Wang: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Shuangling Wang: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Zidong Wang: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Dachuan Xu: Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, P. R. China
Fan Yu: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Asia-Pacific Journal of Operational Research (APJOR), 2023, vol. 40, issue 01, 1-18
Abstract:
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC), proposed an interpretation by viewing natural gradient update as a diagonal method and corrects the inaccurate re-scaling factor in the KFAC eigenbasis. What’s more, a new method to approximate the natural gradient called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) is also proposed, in which the Fisher information matrix (FIM) is approximated as a constant multiplied by the Kronecker product of two matrices and the traces can be kept equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique adopted by TKFAC. We also discuss the differences and relationships among the related Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.
Keywords: Natural gradient; Kronecker-factored approximation; deep neural networks (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0217595923400055
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:apjorx:v:40:y:2023:i:01:n:s0217595923400055
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0217595923400055
Access Statistics for this article
Asia-Pacific Journal of Operational Research (APJOR) is currently edited by Gongyun Zhao
More articles in Asia-Pacific Journal of Operational Research (APJOR) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().