EconPapers    
Economics at your fingertips  
 

Eigenvalue-Corrected Natural Gradient Based on a New Approximation

Kaixin Gao (), Zheng-Hai Huang (), Xiaolei Liu (), Min Wang (), Shuangling Wang (), Zidong Wang (), Dachuan Xu and Fan Yu ()
Additional contact information
Kaixin Gao: School of Mathematics, Tianjin University, Tianjin 300350, P. R. China
Zheng-Hai Huang: School of Mathematics, Tianjin University, Tianjin 300350, P. R. China
Xiaolei Liu: School of Mathematics, Tianjin University, Tianjin 300350, P. R. China
Min Wang: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Shuangling Wang: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Zidong Wang: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China
Dachuan Xu: Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, P. R. China
Fan Yu: Central Software Institute, Huawei Technologies Co. Ltd, Hangzhou 310051, P. R. China

Asia-Pacific Journal of Operational Research (APJOR), 2023, vol. 40, issue 01, 1-18

Abstract: Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC), proposed an interpretation by viewing natural gradient update as a diagonal method and corrects the inaccurate re-scaling factor in the KFAC eigenbasis. What’s more, a new method to approximate the natural gradient called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) is also proposed, in which the Fisher information matrix (FIM) is approximated as a constant multiplied by the Kronecker product of two matrices and the traces can be kept equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique adopted by TKFAC. We also discuss the differences and relationships among the related Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.

Keywords: Natural gradient; Kronecker-factored approximation; deep neural networks (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0217595923400055
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:apjorx:v:40:y:2023:i:01:n:s0217595923400055

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0217595923400055

Access Statistics for this article

Asia-Pacific Journal of Operational Research (APJOR) is currently edited by Gongyun Zhao

More articles in Asia-Pacific Journal of Operational Research (APJOR) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:apjorx:v:40:y:2023:i:01:n:s0217595923400055