Stationary Points of a Shallow Neural Network with Quadratic Activations and the Global Optimality of the Gradient Descent Algorithm

Gamarnik, David; Kızıldağ, Eren C.; Zadik, Ilias

Stationary Points of a Shallow Neural Network with Quadratic Activations and the Global Optimality of the Gradient Descent Algorithm

David Gamarnik (), Eren C. Kızıldağ () and Ilias Zadik ()
Additional contact information
David Gamarnik: Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Eren C. Kızıldağ: Columbia University, New York, New York 10027
Ilias Zadik: Yale University, New Haven, Connecticut 06520

Mathematics of Operations Research, 2025, vol. 50, issue 1, 209-251

Abstract: We consider the problem of training a shallow neural network with quadratic activation functions and the generalization power of such trained networks. Assuming that the samples are generated by a full rank matrix W * of the hidden network node weights, we obtain the following results. We establish that all full-rank approximately stationary solutions of the risk minimization problem are also approximate global optimums of the risk (in-sample and population). As a consequence, we establish that, when trained on polynomially many samples, the gradient descent algorithm converges to the global optimum of the risk minimization problem regardless of the width of the network when it is initialized at some value ν * , which we compute. Furthermore, the network produced by the gradient descent has a near zero generalization error. Next, we establish that initializing the gradient descent algorithm below ν * is easily achieved when the weights of the ground truth matrix W * are randomly generated and the matrix is sufficiently overparameterized. Finally, we identify a simple necessary and sufficient geometric condition on the size of the training set under which any global minimizer of the empirical risk has necessarily zero generalization error.

Keywords: Primary: 68T07; 90C26; secondary: 60B20; neural networks; empirical risk minimization; gradient descent; optimization landscape; generalization; initialization; semicircle law (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.0082 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:50:y:2025:i:1:p:209-251

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().