Online Block Layer Decomposition schemes for training Deep Neural Networks
Laura Palagi and
Ruggiero Seccia ()
Additional contact information
Ruggiero Seccia: Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), University of Rome La Sapienza, Rome, Italy
No 2019-06, DIAG Technical Reports from Department of Computer, Control and Management Engineering, Universita' degli Studi di Roma "La Sapienza"
Abstract:
Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions to the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve the performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method able to effectively tackle difficulties due to the network's depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using different deep networks, and we showed how the application of (online) BCD methods to the training phase of DFNNs permits to outperform standard batch/online algorithms leading to an improvement on both the training phase and the generalization performance of the networks.
Keywords: Deep Feedforward Neural Networks; Block coordinate decomposition; Online Optimization; Large scale optimization (search for similar items in EconPapers)
Date: 2019
New Economics Papers: this item is included in nep-big, nep-cmp and nep-ecm
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://users.diag.uniroma1.it/~biblioteca/sites/de ... ocuments/2019-06.pdf First version, 2019 (application/pdf)
Our link check indicates that this URL is bad, the error code is: 403 Forbidden
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:aeg:report:2019-06
Access Statistics for this paper
More papers in DIAG Technical Reports from Department of Computer, Control and Management Engineering, Universita' degli Studi di Roma "La Sapienza" Contact information at EDIRC.
Bibliographic data for series maintained by Antonietta Angelica Zucconi ( this e-mail address is bad, please contact ).