Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations

Ding, Tian; Li, Dawei; Sun, Ruoyu

Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations

Tian Ding (), Dawei Li () and Ruoyu Sun ()
Additional contact information
Tian Ding: Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong
Dawei Li: Department of Industrial and Enterprise Systems Engineering and Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801
Ruoyu Sun: Department of Industrial and Enterprise Systems Engineering and Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801

Mathematics of Operations Research, 2022, vol. 47, issue 4, 2784-2814

Abstract: Does a large width eliminate all suboptimal local minima for neural nets? An affirmative answer was given by a classic result published in 1995 for one-hidden-layer wide neural nets with a sigmoid activation function, but this result has not been extended to the multilayer case. Recently, it was shown that, with piecewise linear activations, suboptimal local minima exist even for wide nets. Given the classic positive result on smooth activation and the negative result on nonsmooth activations, an interesting open question is: Does a large width eliminate all suboptimal local minima for deep neural nets with smooth activation? In this paper, we give a largely negative answer to this question. Specifically, we prove that, for neural networks with generic input data and smooth nonlinear activation functions, suboptimal local minima can exist no matter how wide the network is (as long as the last hidden layer has at least two neurons). Therefore, the classic result of no suboptimal local minimum for a one-hidden-layer network does not hold. Whereas this classic result assumes sigmoid activation, our counterexample covers a large set of activation functions (dense in the set of continuous functions), indicating that the limitation is not a result of the specific activation. Together with recent progress on piecewise linear activations, our result indicates that suboptimal local minima are common for wide neural nets.

Keywords: Primary: 90C26; secondary: 68T07; 49K10; local minimum; landscape; neural network; deep learning (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.1228 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:47:y:2022:i:4:p:2784-2814

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().