Effect of Initial Configuration of Weights on Training and Function of Artificial Neural Networks
Ricardo J. Jesus,
Mário L. Antunes,
Rui A. da Costa,
Sergey N. Dorogovtsev,
José F. F. Mendes and
Rui L. Aguiar
Additional contact information
Ricardo J. Jesus: Departamento de Eletrónica, Telecomunicações e Informática, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal
Mário L. Antunes: Departamento de Eletrónica, Telecomunicações e Informática, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal
Rui A. da Costa: Departamento de Física & I3N, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal
Sergey N. Dorogovtsev: Departamento de Física & I3N, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal
José F. F. Mendes: Departamento de Física & I3N, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal
Rui L. Aguiar: Departamento de Eletrónica, Telecomunicações e Informática, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal
Mathematics, 2021, vol. 9, issue 18, 1-17
Abstract:
The function and performance of neural networks are largely determined by the evolution of their weights and biases in the process of training, starting from the initial configuration of these parameters to one of the local minima of the loss function. We perform the quantitative statistical characterization of the deviation of the weights of two-hidden-layer feedforward ReLU networks of various sizes trained via Stochastic Gradient Descent (SGD) from their initial random configuration. We compare the evolution of the distribution function of this deviation with the evolution of the loss during training. We observed that successful training via SGD leaves the network in the close neighborhood of the initial configuration of its weights. For each initial weight of a link we measured the distribution function of the deviation from this value after training and found how the moments of this distribution and its peak depend on the initial weight. We explored the evolution of these deviations during training and observed an abrupt increase within the overfitting region. This jump occurs simultaneously with a similarly abrupt increase recorded in the evolution of the loss function. Our results suggest that SGD’s ability to efficiently find local minima is restricted to the vicinity of the random initial configuration of weights.
Keywords: training; evolution of weights; deep learning; neural networks; artificial intelligence (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/9/18/2246/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/18/2246/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:18:p:2246-:d:634243
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().