Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Zhou, Jingcheng; Wei, Wei; Zhang, Ruizhi; Zheng, Zhiming

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Jingcheng Zhou, Wei Wei, Ruizhi Zhang and Zhiming Zheng
Additional contact information
Jingcheng Zhou: School of Mathematical Sciences, Beihang University, Beijing 100191, China
Wei Wei: School of Mathematical Sciences, Beihang University, Beijing 100191, China
Ruizhi Zhang: School of Mathematical Sciences, Beihang University, Beijing 100191, China
Zhiming Zheng: School of Mathematical Sciences, Beihang University, Beijing 100191, China

Mathematics, 2021, vol. 9, issue 13, 1-12

Abstract: First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.

Keywords: stochastic gradient descent; damped Newton; convexity (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://www.mdpi.com/2227-7390/9/13/1533/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/13/1533/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:13:p:1533-:d:585237

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().