aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

Shi, Haoze; Yang, Naisen; Tang, Hong; Yang, Xin

aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

Haoze Shi, Naisen Yang, Hong Tang and Xin Yang
Additional contact information
Haoze Shi: College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China
Naisen Yang: Environment Research Institute, Shandong University, Qingdao 266237, China
Hong Tang: State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Xin Yang: College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China

Mathematics, 2022, vol. 10, issue 6, 1-15

Abstract: In recent years, deep neural networks (DNN) have been widely used in many fields. Lots of effort has been put into training due to their numerous parameters in a deep network. Some complex optimizers with many hyperparameters have been utilized to accelerate the process of network training and improve its generalization ability. It often is a trial-and-error process to tune these hyperparameters in a complex optimizer. In this paper, we analyze the different roles of training samples on a parameter update, visually, and find that a training sample contributes differently to the parameter update. Furthermore, we present a variant of the batch stochastic gradient decedent for a neural network using the ReLU as the activation function in the hidden layers, which is called adaptive stochastic gradient descent (aSGD). Different from the existing methods, it calculates the adaptive batch size for each parameter in the model and uses the mean effective gradient as the actual gradient for parameter updates. Experimental results over MNIST show that aSGD can speed up the optimization process of DNN and achieve higher accuracy without extra hyperparameters. Experimental results over synthetic datasets show that it can find redundant nodes effectively, which is helpful for model compression.

Keywords: deep network optimization; adaptive gradient descent; batch size (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/6/863/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/6/863/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:6:p:863-:d:766964

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().