What is the gradient of a scalar function of a symmetric matrix?
Shriram Srinivasan () and
Nishant Panda ()
Additional contact information
Shriram Srinivasan: Los Alamos National Laboratory
Nishant Panda: Los Alamos National Laboratory
Indian Journal of Pure and Applied Mathematics, 2023, vol. 54, issue 3, 907-919
Abstract:
Abstract For a real valued function $$\phi $$ ϕ of a matrix argument, the gradient $$\nabla \phi $$ ∇ ϕ is calculated using a standard approach that follows from the definition of a Fréchet derivative for matrix functionals. In cases where the matrix argument is restricted to the space of symmetric matrices, the approach is easily modified to determine that the gradient ought to be $$(\nabla \phi + \nabla \phi ^T)/2$$ ( ∇ ϕ + ∇ ϕ T ) / 2 . However, perusal of research articles in the statistics and electrical engineering communities that deal with the topic of matrix calculus reveal a different approach that leads to a spurious result. In this approach, the gradient of $$\phi $$ ϕ is evaluated by explicitly taking into account the symmetry of the matrix, and this “symmetric gradient" $$\nabla \phi _{sym}$$ ∇ ϕ sym is reported to be related to the gradient $$\nabla \phi $$ ∇ ϕ which is computed by ignoring symmetry as $$\nabla \phi _{sym}= \nabla \phi + \nabla \phi ^T - \nabla \phi \circ I$$ ∇ ϕ sym = ∇ ϕ + ∇ ϕ T - ∇ ϕ ∘ I , where $$\circ $$ ∘ denotes the elementwise Hadamard product of the two matrices and I the identity matrix of the same size as $$\nabla \phi $$ ∇ ϕ . The idea of the “symmetric gradient" has now appeared in several publications, as well as in textbooks and handbooks on matrix calculus which are often cited in this context. One of our important contributions has been to wade through the vague and confusing proofs of the result based on matrix calculus and cast the calculation of the “symmetric gradient” in a rigorous and concrete mathematical setting. After setting up the problem in a finite-dimensional inner-product space, we demonstrate rigorously that $$\nabla \phi _{sym}= (\nabla \phi + \nabla \phi ^T)/2$$ ∇ ϕ sym = ( ∇ ϕ + ∇ ϕ T ) / 2 is the correct relationship. Moreover, our derivation exposes that it is an incorrect lifting from the Euclidean space to the space of symmetric matrices, inconsistent with the underlying inner-product, that leads to the spurious result. We also discuss the implications of using the spurious gradient in different classes of problems, such as those where the gradient itself may be the quantity sought, or as part of an optimization algorithm such as gradient descent. We show that the spurious gradient has a relative error of 100% in the off-diagonal components, which makes it an egregious error if the gradient be a quantity of interest, but fortuitously, it proves to be an ascent direction, so that its use in gradient descent may not lead to major issues.
Keywords: Matrix calculus; Symmetric matrix; Fréchet derivative; Gradient; Matrix functional; 15A60; 15A63; 26B12 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13226-022-00313-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:indpam:v:54:y:2023:i:3:d:10.1007_s13226-022-00313-x
Ordering information: This journal article can be ordered from
https://www.springer.com/journal/13226
DOI: 10.1007/s13226-022-00313-x
Access Statistics for this article
Indian Journal of Pure and Applied Mathematics is currently edited by Nidhi Chandhoke
More articles in Indian Journal of Pure and Applied Mathematics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().